Casz compositions and methods of use

ABSTRACT

Provided are compositions and methods that include one or more of: (1) a “CasZ” protein (also referred to as a CasZ polypeptide), a nucleic acid encoding the CasZ protein, and/or a modified host cell comprising the CasZ protein (and/or a nucleic acid encoding the same); (2) a CasZ guide RNA that binds to and provides sequence specificity to the CasZ protein, a nucleic acid encoding the CasZ guide RNA, and/or a modified host cell comprising the CasZ guide RNA (and/or a nucleic acid encoding the same); and (3) a CasZ transactivating noncoding RNA (trancRNA) (referred to herein as a “CasZ trancRNA”), a nucleic acid encoding the CasZ trancRNA, and/or a modified host cell comprising the CasZ trancRNA (and/or a nucleic acid encoding the same).

CROSS-REFERENCE

This application claims the benefit of U.S. Provisional PatentApplication No. 62/580,395, filed Nov. 1, 2017, which application isincorporated herein by reference in its entirety.

INCORPORATION BY REFERENCE OF SEQUENCE LISTING PROVIDED AS A TEXT FILE

A Sequence Listing is provided herewith as a text file,“BERK-374WO_SEQLISTING_ST25.txt” created on Oct. 30, 2018 and having asize of 536 KB. The contents of the text file are incorporated byreference herein in their entirety.

INTRODUCTION

The CRISPR-Cas system, an example of a pathway that was unknown toscience prior to the DNA sequencing era, is now understood to conferbacteria and archaea with acquired immunity against phage and viruses.Intensive research has uncovered the biochemistry of this system.CRISPR-Cas systems consist of Cas proteins, which are involved inacquisition, targeting and cleavage of foreign DNA or RNA, and a CRISPRarray, which includes direct repeats flanking short spacer sequencesthat guide Cas proteins to their targets. Class 2 CRISPR-Cas arestreamlined versions in which a single Cas protein bound to RNA isresponsible for binding to and cleavage of a targeted sequence. Theprogrammable nature of these minimal systems has facilitated their useas a versatile technology that is revolutionizing the field of genomemanipulation.

SUMMARY

The present disclosure provides compositions and methods that includeone or more of: (1) a “CasZ” protein (also referred to as a CasZpolypeptide), a nucleic acid encoding the CasZ protein, and/or amodified host cell comprising the CasZ protein (and/or a nucleic acidencoding the same); (2) a CasZ guide RNA that binds to and providessequence specificity to the CasZ protein, a nucleic acid encoding theCasZ guide RNA, and/or a modified host cell comprising the CasZ guideRNA (and/or a nucleic acid encoding the same); and (3) a CasZtransactivating noncoding RNA (trancRNA) (referred to herein as a “CasZtrancRNA”), a nucleic acid encoding the CasZ trancRNA, and/or a modifiedhost cell comprising the CasZ trancRNA (and/or a nucleic acid encodingthe same).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts examples of naturally occurring CasZ protein sequences.

FIG. 2 depicts schematic representations of CasZ loci, which include aCas1 protein in addition to the CasZ protein.

FIG. 3 depicts a phylogenetic tree of CasZ sequences in relation toother Class 2 CRISPR/Cas effector protein sequences.

FIG. 4 depicts a phylogenetic tree of Cas1 sequences from CasZ loci inrelation to Cas1 sequences from other Class 2 CRISPR/Cas loci.

FIG. 5 depicts transcriptomic RNA mapping data demonstrating expressionof trancRNA from CasZ loci. The trancRNAs are adjacent to the CasZrepeat array, but do not include the repeat sequence and are notcomplementary to the repeat sequence. Shown are RNA mapping data for thefollowing loci: CasZa3, CasZb4, CasZc5, CasZd1, and CasZe3. Smallrepeating aligned arrows represent the repeats of the CRISPR array(indicating the presence of guide RNA-encoding sequence); The peaksoutside and adjacent to the repeat arrays represent highly transcribedtrancRNAs. FIG. 5 (Cont. 1) nucleotide sequences (Top to Bottom: SEQ IDNOs.:312-331). FIG. 5 (Cont. 3) nucleotide sequences (Top to Bottom: SEQID Nos.: 161-177).

FIG. 6 depicts results for PAM preferences as assayed using PAMdepletion assays for CasZc (top) and CasZb (bottom).

FIG. 7 depict the sequences of Cas14 proteins described herein.

FIG. 8, Panels A-D depict the architecture and phylogeny of CRISPR-Cas14genomic loci.

FIG. 9 depicts a phylogenetic analysis of Cas14 orthologs.

FIG. 10 depicts a maximum likelihood tree for Cas1 from known CRISPRsystems.

FIG. 11, Panels A-B depict the acquisition of new spacers byCRISPR-Cas14 systems.

FIG. 12, Panels A-D depict that CRISPR-Cas14a actively adapts andencodes a tracrRNA.

FIG. 13, Panels A-B depict metatranscriptomics for CRISPR-Cas14 loci.

FIG. 14, Panels A-B depict RNA processing and heterologous expression byCRISPR-Cas14.

FIG. 15, Panels A-D depict plasmid depletion by Cas14a1 and SpCas9.

FIG. 16, Panels A-D depict CRISPR-Cas14a is an RNA-guidedDNA-endonuclease.

FIG. 17, Panels A-E depict degradation of ssDNA by Cas14a1.

FIG. 18 depicts kinetics of Cas14a1 cleavage of ssDNA with various guideRNA components.

FIG. 19, Panels A-F depict optimization of Cas14a1 guide RNA components.

FIG. 20, Panels A-E depict high fidelity ssDNA DNP detection byCRISPR-Cas14a.

FIG. 20, panel C provides nucleotide sequences (Top to Bottom: SEQ IDNOs:367-370)

FIG. 21, Panels A-F depict the impact of various activators on Cas14a1cleavage rate.

FIG. 22, Panels A-B depict diversity of CRISPR-Cas14 systems.

FIG. 23, Panels A-C depict a test of Cas14a1 mediated interference in aheterologous host. Diagram of Cas14a1 and LbCas12a constructs to testinterference in E. coli.

FIG. 24 depicts Cas14 nucleotide sequences of plasmids used in thepresent invention.

FIG. 25, Panels A-E depict a sequence map of each of the plasmidsdisclosed in FIG. 24.

DEFINITIONS

“Heterologous,” as used herein, means a nucleotide or polypeptidesequence that is not found in the native nucleic acid or protein,respectively. For example, relative to a CasZ polypeptide, aheterologous polypeptide comprises an amino acid sequence from a proteinother than the CasZ polypeptide. In some cases, a portion of a CasZprotein from one species is fused to a portion of a CasZ protein from adifferent species. The CasZ sequence from each species could thereforebe considered heterologous relative to one another. As another example,a CasZ protein (e.g., a dCasZ protein) can be fused to an active domainfrom a non-CasZ protein (e.g., a histone deacetylase), and the sequenceof the active domain could be considered a heterologous polypeptide (itis heterologous to the CasZ protein).

The terms “polynucleotide” and “nucleic acid,” used interchangeablyherein, refer to a polymeric form of nucleotides of any length, eitherribonucleotides or deoxynucleotides. Thus, this term includes, but isnot limited to, single-, double-, or multi-stranded DNA or RNA, genomicDNA, cDNA, DNA-RNA hybrids, or a polymer comprising purine andpyrimidine bases or other natural, chemically or biochemically modified,non-natural, or derivatized nucleotide bases. The terms “polynucleotide”and “nucleic acid” should be understood to include, as applicable to theembodiment being described, single-stranded (such as sense or antisense)and double-stranded polynucleotides.

The terms “polypeptide,” “peptide,” and “protein”, are usedinterchangeably herein, refer to a polymeric form of amino acids of anylength, which can include genetically coded and non-genetically codedamino acids, chemically or biochemically modified or derivatized aminoacids, and polypeptides having modified peptide backbones. The termincludes fusion proteins, including, but not limited to, fusion proteinswith a heterologous amino acid sequence, fusions with heterologous andhomologous leader sequences, with or without N-terminal methionineresidues; immunologically tagged proteins; and the like.

The term “naturally-occurring” as used herein as applied to a nucleicacid, a protein, a cell, or an organism, refers to a nucleic acid, cell,protein, or organism that is found in nature.

As used herein the term “isolated” is meant to describe apolynucleotide, a polypeptide, or a cell that is in an environmentdifferent from that in which the polynucleotide, the polypeptide, or thecell naturally occurs. An isolated genetically modified host cell may bepresent in a mixed population of genetically modified host cells.

As used herein, the term “exogenous nucleic acid” refers to a nucleicacid that is not normally or naturally found in and/or produced by agiven bacterium, organism, or cell in nature. As used herein, the term“endogenous nucleic acid” refers to a nucleic acid that is normallyfound in and/or produced by a given bacterium, organism, or cell innature. An “endogenous nucleic acid” is also referred to as a “nativenucleic acid” or a nucleic acid that is “native” to a given bacterium,organism, or cell.

“Recombinant,” as used herein, means that a particular nucleic acid (DNAor RNA) is the product of various combinations of cloning, restriction,and/or ligation steps resulting in a construct having a structuralcoding or non-coding sequence distinguishable from endogenous nucleicacids found in natural systems. Generally, DNA sequences encoding thestructural coding sequence can be assembled from cDNA fragments andshort oligonucleotide linkers, or from a series of syntheticoligonucleotides, to provide a synthetic nucleic acid which is capableof being expressed from a recombinant transcriptional unit contained ina cell or in a cell-free transcription and translation system. Suchsequences can be provided in the form of an open reading frameuninterrupted by internal non-translated sequences, or introns, whichare typically present in eukaryotic genes. Genomic DNA comprising therelevant sequences can also be used in the formation of a recombinantgene or transcriptional unit. Sequences of non-translated DNA may bepresent 5′ or 3′ from the open reading frame, where such sequences donot interfere with manipulation or expression of the coding regions, andmay indeed act to modulate production of a desired product by variousmechanisms (see “DNA regulatory sequences”, below).

Thus, e.g., the term “recombinant” polynucleotide or “recombinant”nucleic acid refers to one which is not naturally occurring, e.g., ismade by the artificial combination of two otherwise separated segmentsof sequence through human intervention. This artificial combination isoften accomplished by either chemical synthesis means, or by theartificial manipulation of isolated segments of nucleic acids, e.g., bygenetic engineering techniques. Such is usually done to replace a codonwith a redundant codon encoding the same or a conservative amino acid,while typically introducing or removing a sequence recognition site.Alternatively, it is performed to join together nucleic acid segments ofdesired functions to generate a desired combination of functions. Thisartificial combination is often accomplished by either chemicalsynthesis means, or by the artificial manipulation of isolated segmentsof nucleic acids, e.g., by genetic engineering techniques.

Similarly, the term “recombinant” polypeptide refers to a polypeptidewhich is not naturally occurring, e.g., is made by the artificialcombination of two otherwise separated segments of amino sequencethrough human intervention. Thus, e.g., a polypeptide that comprises aheterologous amino acid sequence is recombinant.

By “construct” or “vector” is meant a recombinant nucleic acid,generally recombinant DNA, which has been generated for the purpose ofthe expression and/or propagation of a specific nucleotide sequence(s),or is to be used in the construction of other recombinant nucleotidesequences.

The terms “DNA regulatory sequences,” “control elements,” and“regulatory elements,” used interchangeably herein, refer totranscriptional and translational control sequences, such as promoters,enhancers, polyadenylation signals, terminators, protein degradationsignals, and the like, that provide for and/or regulate expression of acoding sequence and/or production of an encoded polypeptide in a hostcell.

The term “transformation” is used interchangeably herein with “geneticmodification” and refers to a permanent or transient genetic changeinduced in a cell following introduction of new nucleic acid (e.g., DNAexogenous to the cell) into the cell. Genetic change (“modification”)can be accomplished either by incorporation of the new nucleic acid intothe genome of the host cell, or by transient or stable maintenance ofthe new nucleic acid as an episomal element. Where the cell is aeukaryotic cell, a permanent genetic change is generally achieved byintroduction of new DNA into the genome of the cell. In prokaryoticcells, permanent changes can be introduced into the chromosome or viaextrachromosomal elements such as plasmids and expression vectors, whichmay contain one or more selectable markers to aid in their maintenancein the recombinant host cell. Suitable methods of genetic modificationinclude viral infection, transfection, conjugation, protoplast fusion,electroporation, particle gun technology, calcium phosphateprecipitation, direct microinjection, and the like. The choice of methodis generally dependent on the type of cell being transformed and thecircumstances under which the transformation is taking place (i.e. invitro, ex vivo, or in vivo). A general discussion of these methods canbe found in Ausubel, et al, Short Protocols in Molecular Biology, 3rded., Wiley & Sons, 1995.

“Operably linked” refers to a juxtaposition wherein the components sodescribed are in a relationship permitting them to function in theirintended manner. For instance, a promoter is operably linked to a codingsequence if the promoter affects its transcription or expression. Asused herein, the terms “heterologous promoter” and “heterologous controlregions” refer to promoters and other control regions that are notnormally associated with a particular nucleic acid in nature. Forexample, a “transcriptional control region heterologous to a codingregion” is a transcriptional control region that is not normallyassociated with the coding region in nature.

A “host cell,” as used herein, denotes an in vivo or in vitro eukaryoticcell, a prokaryotic cell, or a cell from a multicellular organism (e.g.,a cell line) cultured as a unicellular entity, which eukaryotic orprokaryotic cells can be, or have been, used as recipients for a nucleicacid (e.g., an expression vector), and include the progeny of theoriginal cell which has been genetically modified by the nucleic acid.It is understood that the progeny of a single cell may not necessarilybe completely identical in morphology or in genomic or total DNAcomplement as the original parent, due to natural, accidental, ordeliberate mutation. A “recombinant host cell” (also referred to as a“genetically modified host cell”) is a host cell into which has beenintroduced a heterologous nucleic acid, e.g., an expression vector. Forexample, a subject prokaryotic host cell is a genetically modifiedprokaryotic host cell (e.g., a bacterium), by virtue of introductioninto a suitable prokaryotic host cell of a heterologous nucleic acid,e.g., an exogenous nucleic acid that is foreign to (not normally foundin nature in) the prokaryotic host cell, or a recombinant nucleic acidthat is not normally found in the prokaryotic host cell; and a subjecteukaryotic host cell is a genetically modified eukaryotic host cell, byvirtue of introduction into a suitable eukaryotic host cell of aheterologous nucleic acid, e.g., an exogenous nucleic acid that isforeign to the eukaryotic host cell, or a recombinant nucleic acid thatis not normally found in the eukaryotic host cell.

The term “conservative amino acid substitution” refers to theinterchangeability in proteins of amino acid residues having similarside chains. For example, a group of amino acids having aliphatic sidechains consists of glycine, alanine, valine, leucine, and isoleucine; agroup of amino acids having aliphatic-hydroxyl side chains consists ofserine and threonine; a group of amino acids having amide-containingside chains consists of asparagine and glutamine; a group of amino acidshaving aromatic side chains consists of phenylalanine, tyrosine, andtryptophan; a group of amino acids having basic side chains consists oflysine, arginine, and histidine; and a group of amino acids havingsulfur-containing side chains consists of cysteine and methionine.Exemplary conservative amino acid substitution groups are:valine-leucine-isoleucine, phenylalanine-tyrosine, lysine-arginine,alanine-valine, and asparagine-glutamine.

A polynucleotide or polypeptide has a certain percent “sequenceidentity” to another polynucleotide or polypeptide, meaning that, whenaligned, that percentage of bases or amino acids are the same, and inthe same relative position, when comparing the two sequences. Sequencesimilarity can be determined in a number of different manners. Todetermine sequence identity, sequences can be aligned using the methodsand computer programs, including BLAST, available over the world wideweb at ncbi.nlm.nih.gov/BLAST. See, e.g., Altschul et al. (1990), J.Mol. Biol. 215:403-10. Another alignment algorithm is FASTA, availablein the Genetics Computing Group (GCG) package, from Madison, Wis., USA,a wholly owned subsidiary of Oxford Molecular Group, Inc. Othertechniques for alignment are described in Methods in Enzymology, vol.266: Computer Methods for Macromolecular Sequence Analysis (1996), ed.Doolittle, Academic Press, Inc., a division of Harcourt Brace & Co., SanDiego, Calif., USA. Of particular interest are alignment programs thatpermit gaps in the sequence. The Smith-Waterman is one type of algorithmthat permits gaps in sequence alignments. See Meth. Mol. Biol. 70:173-187 (1997). Also, the GAP program using the Needleman and Wunschalignment method can be utilized to align sequences. See J. Mol. Biol.48: 443-453 (1970).

As used herein, the terms “treatment,” “treating,” and the like, referto obtaining a desired pharmacologic and/or physiologic effect. Theeffect may be prophylactic in terms of completely or partiallypreventing a disease or symptom thereof and/or may be therapeutic interms of a partial or complete cure for a disease and/or adverse effectattributable to the disease. “Treatment,” as used herein, covers anytreatment of a disease in a mammal, e.g., in a human, and includes: (a)preventing the disease from occurring in a subject which may bepredisposed to the disease but has not yet been diagnosed as having it;(b) inhibiting the disease, i.e., arresting its development; and (c)relieving the disease, i.e., causing regression of the disease.

The terms “individual,” “subject,” “host,” and “patient,” usedinterchangeably herein, refer to an individual organism, e.g., a mammal,including, but not limited to, murines, simians, non-human primates,humans, mammalian farm animals, mammalian sport animals, and mammalianpets.

Before the present invention is further described, it is to beunderstood that this invention is not limited to particular embodimentsdescribed, as such may, of course, vary. It is also to be understoodthat the terminology used herein is for the purpose of describingparticular embodiments only, and is not intended to be limiting, sincethe scope of the present invention will be limited only by the appendedclaims.

Where a range of values is provided, it is understood that eachintervening value, to the tenth of the unit of the lower limit unlessthe context clearly dictates otherwise, between the upper and lowerlimit of that range and any other stated or intervening value in thatstated range, is encompassed within the invention. The upper and lowerlimits of these smaller ranges may independently be included in thesmaller ranges, and are also encompassed within the invention, subjectto any specifically excluded limit in the stated range. Where the statedrange includes one or both of the limits, ranges excluding either orboth of those included limits are also included in the invention.

Unless defined otherwise, all technical and scientific terms used hereinhave the same meaning as commonly understood by one of ordinary skill inthe art to which this invention belongs. Although any methods andmaterials similar or equivalent to those described herein can also beused in the practice or testing of the present invention, the preferredmethods and materials are now described. All publications mentionedherein are incorporated herein by reference to disclose and describe themethods and/or materials in connection with which the publications arecited.

It must be noted that as used herein and in the appended claims, thesingular forms “a,” “an,” and “the” include plural referents unless thecontext clearly dictates otherwise. Thus, for example, reference to “aCasZ polypeptide” includes a plurality of such polypeptides andreference to “the guide RNA” includes reference to one or more guideRNAs and equivalents thereof known to those skilled in the art, and soforth. It is further noted that the claims may be drafted to exclude anyoptional element. As such, this statement is intended to serve asantecedent basis for use of such exclusive terminology as “solely,”“only” and the like in connection with the recitation of claim elements,or use of a “negative” limitation.

It is appreciated that certain features of the invention, which are, forclarity, described in the context of separate embodiments, may also beprovided in combination in a single embodiment. Conversely, variousfeatures of the invention, which are, for brevity, described in thecontext of a single embodiment, may also be provided separately or inany suitable sub-combination. All combinations of the embodimentspertaining to the invention are specifically embraced by the presentinvention and are disclosed herein just as if each and every combinationwas individually and explicitly disclosed. In addition, allsub-combinations of the various embodiments and elements thereof arealso specifically embraced by the present invention and are disclosedherein just as if each and every such sub-combination was individuallyand explicitly disclosed herein.

The publications discussed herein are provided solely for theirdisclosure prior to the filing date of the present application. Nothingherein is to be construed as an admission that the present invention isnot entitled to antedate such publication by virtue of prior invention.Further, the dates of publication provided may be different from theactual publication dates which may need to be independently confirmed.

DETAILED DESCRIPTION

The present disclosure provides compositions and methods that includeone or more of: (1) a “CasZ” protein (also referred to as a CasZpolypeptide), a nucleic acid encoding the CasZ protein, and/or amodified host cell comprising the CasZ protein (and/or a nucleic acidencoding the same); (2) a CasZ guide RNA that binds to and providessequence specificity to the CasZ protein, a nucleic acid encoding theCasZ guide RNA, and/or a modified host cell comprising the CasZ guideRNA (and/or a nucleic acid encoding the same); and (3) a CasZtransactivating noncoding RNA (trancRNA) (referred to herein as a “CasZtrancRNA”), a nucleic acid encoding the CasZ trancRNA, and/or a modifiedhost cell comprising the CasZ trancRNA (and/or a nucleic acid encodingthe same).

Compositions

CRISPR/CasZ Proteins, Guide RNAs, and trancRNAs

Class 2 CRISPR-Cas systems are characterized by effector modules thatinclude a single multidomain protein. In the CasZ system, a CRISPR/Casendonuclease (e.g., a CasZ protein) interacts with (binds to) acorresponding guide RNA (e.g., a CasZ guide RNA) to form aribonucleoprotein (RNP) complex that is targeted to a particular site ina target nucleic acid via base pairing between the guide RNA and atarget sequence within the target nucleic acid molecule. A guide RNAincludes a nucleotide sequence (a guide sequence) that is complementaryto a sequence (the target site) of a target nucleic acid. Thus, a CasZprotein forms a complex with a CasZ guide RNA and the guide RNA providessequence specificity to the RNP complex via the guide sequence. The CasZprotein of the complex provides the site-specific activity. In otherwords, the CasZ protein is guided to a target site (e.g., stabilized ata target site) within a target nucleic acid (e.g. a target nucleotidesequence within a target chromosomal nucleic acid; or a targetnucleotide sequence within a target extrachromosomal nucleic acid, e.g.,an episomal nucleic acid, a minicircle nucleic acid, a mitochondrialnucleic acid, a chloroplast nucleic acid, etc.) by virtue of itsassociation with the guide RNA.

The present disclosure provides compositions comprising a CasZpolypeptide (and/or a nucleic acid encoding the CasZ polypeptide) (e.g.,where the CasZ polypeptide can be a naturally existing CasZ protein, anickase CasZ protein, a dCasZ protein, a chimeric CasZ protein, etc.XaCasZa, CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, CasZi, CasZj,CasZK, or CasZl protein). The present disclosure provides compositionscomprising a CasZ guide RNA (and/or a nucleic acid encoding the CasZguide RNA). For example, the present disclosure provides compositionscomprising (a) a CasZ polypeptide (and/or a nucleic acid encoding theCasZ polypeptide) and (b) a CasZ guide RNA (and/or a nucleic acidencoding the CasZ guide RNA). The present disclosure provides a nucleicacid/protein complex (RNP complex) comprising: (a) a CasZ polypeptide;and (b) a CasZ guide RNA. The present disclosure provides compositionscomprising a CasZ trancRNA. The present disclosure provides compositionscomprising a CasZ trancRNA and one or more of: (a) a CasZ protein, and(b) a CasZ guide RNA (e.g., comprising a CasZ trancRNA and a CasZprotein, a CasZ trancRNA and a CasZ guide RNA, or a CasZ trancRNA and aCasZ protein and a CasZ guide RNA. The present disclosure provides anucleic acid/protein complex (RNP complex) comprising: (a) a CasZpolypeptide; (b) a CasZ guide RNA; and (c) a CasZ trancRNA. The presentdisclosure provides compositions comprising a CasZ protein and one ormore of: (a) a CasZ trancRNA, and (b) a CasZ guide RNA.

CasZ Protein

A CasZ polypeptide (this term is used interchangeably with the term“CasZ protein”, “Cas14”, “Cas14 polypeptide”, or “Cas14 protein”) canbind and/or modify (e.g., cleave, nick, methylate, demethylate, etc.) atarget nucleic acid and/or a polypeptide associated with target nucleicacid (e.g., methylation or acetylation of a histone tail) (e.g., in somecases the CasZ protein includes a fusion partner with an activity, andin some cases the CasZ protein provides nuclease activity). In somecases, the CasZ protein is a naturally-occurring protein (e.g.,naturally occurs in prokaryotic cells). In other cases, the CasZ proteinis not a naturally-occurring polypeptide (e.g., the CasZ protein is avariant CasZ protein, a chimeric protein, and the like). A CasZ proteinincludes 3 partial RuvC domains (RuvC-I, RuvC-II, and RuvC-III, alsoreferred to herein as subdomains) that are not contiguous with respectto the primary amino acid sequence of the CasZ protein, but form a RuvCdomain once the protein is produced and folds. A naturally occurringCasZ protein functions as an endonuclease that catalyzes cleavage at aspecific sequence in a targeted nucleic acid (e.g., a double strandedDNA (dsDNA)). The sequence specificity is provided by the associatedguide RNA, which hybridizes to a target sequence within the target DNA.The naturally occurring CasZ guide RNA is a crRNA, where the crRNAincludes (i) a guide sequence that hybridizes to a target sequence inthe target DNA and (ii) a protein binding segment that binds to the CasZprotein.

In some embodiments, the CasZ protein of the subject methods and/orcompositions is (or is derived from) a naturally occurring (wild type)protein. Examples of naturally occurring CasZ proteins (e.g., CasZa,CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZk,CasZl) are depicted in FIG. 1. In some cases, a subject CasZ protein isa CasZa protein. In some cases, a subject CasZ protein is a CasZbprotein. In some cases, a subject CasZ protein is a CasZc protein. Insome cases, a subject CasZ protein is a CasZd protein. In some cases, asubject CasZ protein is a CasZe protein. In some cases, a subject CasZprotein is a CasZf protein. In some cases, a subject CasZ protein is aCasZg protein. In some cases, a subject CasZ protein is a CasZh protein.In some cases, a subject CasZ protein is a CasZi protein. In some cases,a subject CasZ protein is a CasZj protein. In some cases, a subject CasZprotein is a CasZk protein. In some cases, a subject CasZ protein is aCasZl protein. In some cases, a subject CasZ protein is a CasZe, CasZf,CasZg, or CasZh protein. In some cases, a subject CasZ protein is aCasZj, CasZk, or CasZl protein.

It is important to note that this newly discovered protein (CasZ) isshort compared to previously identified CRISPR-Cas endonucleases, andthus use of this protein as an alternative provides the advantage thatthe nucleotide sequence encoding the protein is relatively short. Thisis useful, for example, in cases where a nucleic acid encoding the CasZprotein is desirable, e.g., in situations that employ a viral vector(e.g., an AAV vector), for delivery to a cell such as a eukaryotic cell(e.g., mammalian cell, human cell, mouse cell, in vitro, ex vivo, invivo) for research and/or clinical applications. In addition, in theirnatural context, the CasZ-encoding DNA sequences are present in locithat also have a Cas1 protein.

In some cases, a subject CasZ protein has a length of 900 amino acids orless (e.g., 850 amino acids or less, 800 amino acids or less, 750 aminoacids or less, or 700 amino acids or less). In some cases, a subjectCasZ protein has a length of 850 amino acids or less (e.g., 850 aminoacids or less). In some cases, a subject CasZ protein length of 800amino acids or less (e.g., 750 amino acids or less). In some cases, asubject CasZ protein has a length of 700 amino acids or less. In somecases, a subject CasZ protein has a length of 650 amino acids or less.

In some cases, a subject CasZ protein has a length in a range of from350-900 amino acids (e.g., 350-850, 350-800, 350-750, 350-700, 400-900,400-850, 400-800, 400-750, or 400-700 amino acids).

In some cases, a subject CasZ protein (e.g., CasZa) has a length in arange of from 350-750 amino acids (e.g., 350-700, 350-550, 450-550,450-750, 450-650, or 450-550 amino acids). In some cases, a subject CasZprotein (e.g., CasZa) has a length in a range of from 450-750 aminoacids (e.g., 500-700 amino acids). In some cases, a subject CasZ protein(e.g., CasZa) has a length in a range of from 350-700 amino acids (e.g.,350-650, 350-600, or 350-550 amino acids). In some cases, a subject CasZprotein (e.g., CasZa) has a length in a range of from 500-700 aminoacids. In some cases, a subject CasZ protein (e.g., CasZa) has a lengthin a range of from 450-550 amino acids. In some cases, a subject CasZprotein (e.g., CasZa) has a length in a range of from 350-550 aminoacids.

In some cases, a subject CasZ protein (e.g., CasZb) has a length in arange of from 350-700 amino acids (e.g., 350-650, or 350-620 aminoacids). In some cases, a subject CasZ protein (e.g., CasZb) has a lengthin a range of from 450-700 amino acids (e.g., 450-650, 500-650 or500-620 amino acids). In some cases, a subject CasZ protein (e.g.,CasZb) has a length in a range of from 500-650 amino acids (e.g.,500-620 amino acids). In some cases, a subject CasZ protein (e.g.,CasZb) has a length in a range of from 500-620 amino acids.

In some cases, a subject CasZ protein (e.g., CasZc) has a length in arange of from 600-800 amino acids (e.g., 600-650 or 700-800 aminoacids). In some cases, a subject CasZ protein (e.g., CasZc) has a lengthin a range of from 600-650 amino acids. In some cases, a subject CasZprotein (e.g., CasZc) has a length in a range of from 700-800 aminoacids.

In some cases, a subject CasZ protein (e.g., CasZd) has a length in arange of from 400-650 amino acids (e.g., 400-600, 400-550, 500-650,500-600 or 500-550 amino acids). In some cases, a subject CasZ protein(e.g., CasZd) has a length in a range of from 500-600 amino acids. Insome cases, a subject CasZ protein (e.g., CasZd) has a length in a rangeof from 500-550 amino acids. In some cases, a subject CasZ protein(e.g., CasZd) has a length in a range of from 400-550 amino acids.

In some cases, a subject CasZ protein (e.g., CasZe) has a length in arange of from 450-700 amino acids (e.g., 450-650, 450-615, 475-700,475-650, or 475-615 amino acids). In some cases, a subject CasZ protein(e.g., CasZe) has a length in a range of from 450-675 amino acids. Insome cases, a subject CasZ protein (e.g., CasZe) has a length in a rangeof from 475-675 amino acids.

In some cases, a subject CasZ protein (e.g., CasZf) has a length in arange of from 400-550 amino acids (e.g., 400-520, 400-500, 400-475,415-550, 415-520, 415-500, or 415-475 amino acids). In some cases, asubject CasZ protein (e.g., CasZf) has a length in a range of from400-475 amino acids (e.g., 400-450 amino acids).

In some cases, a subject CasZ protein (e.g., CasZg) has a length in arange of from 500-750 amino acids (e.g., 550-750 or 500-700 aminoacids). In some cases, a subject CasZ protein (e.g., CasZg) has a lengthin a range of from 700-750 amino acids. In some cases, a subject CasZprotein (e.g., CasZg) has a length in a range of from 550-600 aminoacids.

In some cases, a subject CasZ protein (e.g., CasZh) has a length in arange of from 380-450 amino acids (e.g., 380-420, 400-450, or 400-420amino acids). In some cases, a subject CasZ protein (e.g., CasZh) has alength in a range of from 400-420 amino acids.

In some cases, a subject CasZ protein (e.g., CasZi) has a length in arange of from 700-800 amino acids (e.g., 700-750, 720-800, or 720-750amino acids). In some cases, a subject CasZ protein (e.g., CasZi) has alength in a range of from 720-780 amino acids.

In some cases, a subject CasZ protein (e.g., CasZj) has a length in arange of from 600-750 amino acids (e.g., 600-700 or 650-700 aminoacids). In some cases, a subject CasZ protein (e.g., CasZj) has a lengthin a range of from 400-420 amino acids.

In some cases, a subject CasZ protein (e.g., CasZk) has a length in arange of from 450-600 amino acids (e.g., 450-580, 480-600, 480-580, or500-600 amino acids). In some cases, a subject CasZ protein (e.g.,CasZk) has a length in a range of from 480-580 amino acids.

In some cases, a subject CasZ protein (e.g., CasZl) has a length in arange of from 350-500 amino acids (e.g., 350-450, 380-450, 350-420, or380-420 amino acids). In some cases a subject CasZ protein (e.g., CasZl)has a length in a range of from 380-420 amino acids.

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZa protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZa protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZa protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZa protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZa amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZaamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes one or more amino acid substitutions (e.g., 1, 2, or 3amino acid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZaprotein of FIG. 1 or FIG. 7 and has a length in a range of from 350-800amino acids (e.g., 350-800, 350-750, 350-700, 350-550, 450-550, 450-750,450-650, or 450-550 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZb protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZb protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZb protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZb protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZb amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZbamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZbprotein of FIG. 1 or FIG. 7 and has a length in a range of from 350-700amino acids (e.g., 350-650, or 350-620 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZc protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZc protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZc protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZc protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZc amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZcamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZcprotein of FIG. 1 or FIG. 7 and has a length in a range of from 600-800amino acids (e.g., 600-650 or 700-800 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZd protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZd protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZd protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZd protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZd amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZdamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZdprotein of FIG. 1 or FIG. 7 and has a length in a range of from 400-650amino acids (e.g., 400-600, 400-550, 500-650, 500-600 or 500-550 aminoacids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZe protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZe protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZe protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZe protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZe amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZeamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZeprotein of FIG. 1 or FIG. 7 and has a length in a range of from 450-700amino acids (e.g., 450-650, 450-615, 475-700, 475-650, or 475-615 aminoacids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZf protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZf protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZf protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZf protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZf amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZfamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZfprotein of FIG. 1 or FIG. 7 and has a length in a range of from 400-750amino acids (e.g., 400-700, 700-650, 400-620, 400-600, 400-550, 400-520,400-500, 400-475, 415-550, 415-520, 415-500, or 415-475 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZg protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZg protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZg protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZg protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZg amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZgamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZgprotein of FIG. 1 or FIG. 7 and has a length in a range of from 500-750amino acids (e.g., 500-750 amino acids (e.g., 550-750 amino acids)).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZh protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZh protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZh protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZh protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZh amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZhamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZhprotein of FIG. 1 or FIG. 7 and has a length in a range of from 380-450amino acids (e.g., 380-420, 400-450, or 400-420 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZi protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZi protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZi protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZi protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZi amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZiamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZiprotein of FIG. 1 or FIG. 7 and has a length in a range of from 700-800amino acids (e.g., 700-750, 720-800, or 720-750 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZj protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZj protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZj protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZj protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZj amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZjamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZjprotein of FIG. 1 or FIG. 7 and has a length in a range of from 600-750amino acids (e.g., 600-700 or 650-700 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZk protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZk protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZk protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZk protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZk amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZkamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZkprotein of FIG. 1 or FIG. 7 and has a length in a range of from 450-600amino acids (e.g., 450-580, 480-600, 480-580, or 500-600 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZl protein of FIG. 1 or FIG. 7. For example, in some cases, a subjectCasZ protein includes an amino acid sequence having 50% or more sequenceidentity (e.g., 60% or more, 70% or more, 80% or more, 85% or more, 90%or more, 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZl protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes an amino acid sequence having 80%or more sequence identity (e.g., 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZl protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having 90% or more sequence identity(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%sequence identity) with a CasZl protein of FIG. 1 or FIG. 7. In somecases, a subject CasZ protein includes a CasZl amino acid sequence ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes a CasZlamino acid sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions) (e.g., in some cases such that the CasZ protein is adCasZ). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZlprotein of FIG. 1 or FIG. 7 and has a length in a range of from 450-600amino acids (e.g., 450-580, 480-600, 480-580, or 500-600 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZe, CasZf, CasZg, or CasZh protein of FIG. 1 or FIG. 7. For example,in some cases, a subject CasZ protein includes an amino acid sequencehaving 50% or more sequence identity (e.g., 60% or more, 70% or more,80% or more, 85% or more, 90% or more, 95% or more, 97% or more, 98% ormore, 99% or more, or 100% sequence identity) with a CasZe, CasZf,CasZg, or CasZh protein of FIG. 1 or FIG. 7. In some cases, a subjectCasZ protein includes an amino acid sequence having 80% or more sequenceidentity (e.g., 85% or more, 90% or more, 95% or more, 97% or more, 98%or more, 99% or more, or 100% sequence identity) with a CasZe, CasZf,CasZg, or CasZh protein of FIG. 1 or FIG. 7. In some cases, a subjectCasZ protein includes an amino acid sequence having 90% or more sequenceidentity (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or100% sequence identity) with a CasZe, CasZf, CasZg, or CasZh protein ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes anamino acid sequence having a CasZe, CasZf, CasZg, or CasZh proteinsequence of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having a CasZe, CasZf, CasZg, or CasZhprotein sequence of FIG. 1 or FIG. 7, with the exception that thesequence includes an amino acid substitution (e.g., 1, 2, or 3 aminoacid substitutions) that reduces the naturally occurring catalyticactivity of the protein (e.g., such as at one or more catalytic aminoacid positions). In some cases, a subject CasZ protein includes an aminoacid sequence having 90% or more sequence identity (e.g., 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZe, CasZf, CasZg, or CasZh protein of FIG. 1 or FIG. 7 and has alength in a range of from 350-900 amino acids (e.g., 350-850, 350-800,400-900, 400-850, or 400-800 amino acids).

In some cases, a subject CasZ protein (of the subject compositionsand/or methods) includes an amino acid sequence having 20% or moresequence identity (e.g., 30% or more, 40% or more, 50% or more, 60% ormore, 70% or more, 80% or more, 85% or more, 90% or more, 95% or more,97% or more, 98% or more, 99% or more, or 100% sequence identity) with aCasZa, CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, CasZi, CasZj,CasZK, or CasZl protein of FIG. 1 or FIG. 7. For example, in some cases,a subject CasZ protein includes an amino acid sequence having 50% ormore sequence identity (e.g., 60% or more, 70% or more, 80% or more, 85%or more, 90% or more, 95% or more, 97% or more, 98% or more, 99% ormore, or 100% sequence identity) with a CasZa, CasZb, CasZc, CasZd,CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, or CasZl protein ofFIG. 1 or FIG. 7. In some cases, a subject CasZ protein includes anamino acid sequence having 80% or more sequence identity (e.g., 85% ormore, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more,or 100% sequence identity) with a CasZa, CasZb, CasZc, CasZd, CasZe,CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, or CasZl protein of FIG. 1 orFIG. 7. In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZa,CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, orCasZl protein of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having a CasZa, CasZb, CasZc, CasZd,CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, or CasZl proteinsequence of FIG. 1 or FIG. 7. In some cases, a subject CasZ proteinincludes an amino acid sequence having a CasZa, CasZb, CasZc, CasZd,CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, or CasZl proteinsequence of FIG. 1 or FIG. 7, with the exception that the sequenceincludes an amino acid substitution (e.g., 1, 2, or 3 amino acidsubstitutions) that reduces the naturally occurring catalytic activityof the protein (e.g., such as at one or more catalytic amino acidpositions). In some cases, a subject CasZ protein includes an amino acidsequence having 90% or more sequence identity (e.g., 95% or more, 97% ormore, 98% or more, 99% or more, or 100% sequence identity) with a CasZa,CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, CasZi, CasZj, CasZK, orCasZl protein of FIG. 1 or FIG. 7 and has a length in a range of from350-900 amino acids (e.g., 350-850, 350-800, 400-900, 400-850, or400-800 amino acids).

CasZ Variants

A variant CasZ protein has an amino acid sequence that is different byat least one amino acid (e.g., has a deletion, insertion, substitution,fusion) when compared to the amino acid sequence of the correspondingwild type CasZ protein. A CasZ protein that cleaves one strand but notthe other of a double stranded target nucleic acid is referred to hereinas a “nickase” (e.g., a “nickase CasZ”). A CasZ protein that hassubstantially no nuclease activity is referred to herein as a dead CasZprotein (“dCasZ”) (with the caveat that nuclease activity can beprovided by a heterologous polypeptide—a fusion partner—in the case of achimeric CasZ protein, which is described in more detail below). For anyof the CasZ variant proteins described herein (e.g., nickase CasZ,dCasZ, chimeric CasZ), the CasZ variant can include a CasZ proteinsequence with the same parameters described above (e.g., domains thatare present, percent identity, length, and the like).

Variants—Catalytic Activity

In some cases, the CasZ protein is a variant CasZ protein, e.g., mutatedrelative to the naturally occurring catalytically active sequence, andexhibits reduced cleavage activity (e.g., exhibits 90%, or less, 80% orless, 70% or less, 60% or less, 50% or less, 40% or less, or 30% or lesscleavage activity) when compared to the corresponding naturallyoccurring sequence. In some cases, such a variant CasZ protein is acatalytically ‘dead’ protein (has substantially no cleavage activity)and can be referred to as a ‘dCasZ.’ In some cases, the variant CasZprotein is a nickase (cleaves only one strand of a double strandedtarget nucleic acid, e.g., a double stranded target DNA). As describedin more detail herein, in some cases, a CasZ protein (in some case aCasZ protein with wild type cleavage activity and in some cases avariant CasZ with reduced cleavage activity, e.g., a dCasZ or a nickaseCasZ) is fused (conjugated) to a heterologous polypeptide that has anactivity of interest (e.g., a catalytic activity of interest) to form afusion protein (a chimeric CasZ protein).

Catalytic residues of CasZ include D405, E586 and D684 when numberedaccording to CasZi.1 (e.g., see FIG. 1). Thus, in some cases, the CasZprotein has reduced activity and one or more of the above describedamino acids (or one or more corresponding amino acids of any CasZprotein) are mutated (e.g., substituted with an alanine). In some cases,the variant CasZ protein is a catalytically ‘dead’ protein (iscatalytically inactive) and is referred to as ‘dCasZ.’ A dCasZ proteincan be fused to a fusion partner that provides an activity, and in somecases, the dCasZ (e.g., one without a fusion partner that providescatalytic activity—but which can have an NLS when expressed in aeukaryotic cell) can bind to target DNA and can be used for imaging(e.g., the protein can be tagged/labeled) and/or can block RNApolymerase from transcribing from a target DNA. In some cases, thevariant CasZ protein is a nickase (cleaves only one strand of a doublestranded target nucleic acid, e.g., a double stranded target DNA).

Variants—Chimeric CasZ (i.e., Fusion Proteins)

As noted above, in some cases, a CasZ protein (in some cases a CasZprotein with wild type cleavage activity and in some cases a variantCasZ with reduced cleavage activity, e.g., a dCasZ or a nickase CasZ) isfused (conjugated) to a heterologous polypeptide that has an activity ofinterest (e.g., a catalytic activity of interest) to form a fusionprotein (a chimeric CasZ protein). A heterologous polypeptide to which aCasZ protein can be fused is referred to herein as a ‘fusion partner.’

In some cases, the fusion partner can modulate transcription (e.g.,inhibit transcription, increase transcription) of a target DNA. Forexample, in some cases the fusion partner is a protein (or a domain froma protein) that inhibits transcription (e.g., a transcriptionalrepressor, a protein that functions via recruitment of transcriptioninhibitor proteins, modification of target DNA such as methylation,recruitment of a DNA modifier, modulation of histones associated withtarget DNA, recruitment of a histone modifier such as those that modifyacetylation and/or methylation of histones, and the like). In some casesthe fusion partner is a protein (or a domain from a protein) thatincreases transcription (e.g., a transcription activator, a protein thatacts via recruitment of transcription activator proteins, modificationof target DNA such as demethylation, recruitment of a DNA modifier,modulation of histones associated with target DNA, recruitment of ahistone modifier such as those that modify acetylation and/ormethylation of histones, and the like).

In some cases, a chimeric CasZ protein includes a heterologouspolypeptide that has enzymatic activity that modifies a target nucleicacid (e.g., nuclease activity such as FokI nuclease activity,methyltransferase activity, demethylase activity, DNA repair activity,DNA damage activity, deamination activity, dismutase activity,alkylation activity, depurination activity, oxidation activity,pyrimidine dimer forming activity, integrase activity, transposaseactivity, recombinase activity, polymerase activity, ligase activity,helicase activity, photolyase activity or glycosylase activity).

In some cases, a chimeric CasZ protein includes a heterologouspolypeptide that has enzymatic activity that modifies a polypeptide(e.g., a histone) associated with a target nucleic acid (e.g.,methyltransferase activity, demethylase activity, acetyltransferaseactivity, deacetylase activity, kinase activity, phosphatase activity,ubiquitin ligase activity, deubiquitinating activity, adenylationactivity, deadenylation activity, SUMOylating activity, deSUMOylatingactivity, ribosylation activity, deribosylation activity, myristoylationactivity or demyristoylation activity).

Examples of proteins (or fragments thereof) that can be used in increasetranscription include but are not limited to: transcriptional activatorssuch as VP16, VP64, VP48, VP160, p65 subdomain (e.g., from NFkB), andactivation domain of EDLL and/or TAL activation domain (e.g., foractivity in plants); histone lysine methyltransferases such as SET1A,SET1B, MLL1 to 5, ASH1, SYMD2, NSD1, and the like; histone lysinedemethylases such as JHDM2a/b, UTX, JMJD3, and the like; histoneacetyltransferases such as GCN5, PCAF, CBP, p300, TAF1, TIP60/PLIP,MOZ/MYST3, MORF/MYST4, SRC1, ACTR, P160, CLOCK, and the like; and DNAdemethylases such as Ten-Eleven Translocation (TET) dioxygenase 1(TET1CD), TET1, DME, DML1, DML2, ROS1, and the like.

Examples of proteins (or fragments thereof) that can be used in decreasetranscription include but are not limited to: transcriptional repressorssuch as the Krüppel associated box (KRAB or SKD); KOX 1 repressiondomain; the Mad mSIN3 interaction domain (SID); the ERF repressor domain(ERD), the SRDX repression domain (e.g., for repression in plants), andthe like; histone lysine methyltransferases such as Pr-SET7/8,SUV4-20H1, RIZ1, and the like; histone lysine demethylases such asJMJD2A/JHDM3A, JMJD2B, JMJD2C/GASC1, JMJD2D, JARID1A/RBP2,JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, and the like; histone lysinedeacetylases such as HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7,HDAC9, SIRT1, SIRT2, HDAC11, and the like; DNA methylases such as HhaIDNA m5c-methyltransferase (M.HhaI), DNA methyltransferase 1 (DNMT1), DNAmethyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI,DRM3 (plants), ZMET2, CMT1, CMT2 (plants), and the like; and peripheryrecruitment elements such as Lamin A, Lamin B, and the like.

In some cases, the fusion partner has enzymatic activity that modifiesthe target nucleic acid (e.g., ssRNA, dsRNA, ssDNA, dsDNA). Examples ofenzymatic activity that can be provided by the fusion partner includebut are not limited to: nuclease activity such as that provided by arestriction enzyme (e.g., FokI nuclease), methyltransferase activitysuch as that provided by a methyltransferase (e.g., HhaI DNAm5c-methyltransferase (M.HhaI), DNA methyltransferase 1 (DNMT1), DNAmethyltransferase 3a (DNMT3a), DNA methyltransferase 3b (DNMT3b), METI,DRM3 (plants), ZMET2, CMT1, CMT2 (plants), and the like); demethylaseactivity such as that provided by a demethylase (e.g., Ten-ElevenTranslocation (TET) dioxygenase 1 (TET1CD), TET1, DME, DML1, DML2, ROS1,and the like), DNA repair activity, DNA damage activity, deaminationactivity such as that provided by a deaminase (e.g., a cytosinedeaminase enzyme such as rat APOBEC1), dismutase activity, alkylationactivity, depurination activity, oxidation activity, pyrimidine dimerforming activity, integrase activity such as that provided by anintegrase and/or resolvase (e.g., Gin invertase such as the hyperactivemutant of the Gin invertase, GinH106Y; human immunodeficiency virus type1 integrase (IN); Tn3 resolvase; and the like), transposase activity,recombinase activity such as that provided by a recombinase (e.g.,catalytic domain of Gin recombinase), polymerase activity, ligaseactivity, helicase activity, photolyase activity, and glycosylaseactivity).

In some cases, the fusion partner has enzymatic activity that modifies aprotein associated with the target nucleic acid (e.g., ssRNA, dsRNA,ssDNA, dsDNA) (e.g., a histone, an RNA binding protein, a DNA bindingprotein, and the like). Examples of enzymatic activity (that modifyies aprotein associated with a target nucleic acid) that can be provided bythe fusion partner include but are not limited to: methyltransferaseactivity such as that provided by a histone methyltransferase (HMT)(e.g., suppressor of variegation 3-9 homolog 1 (SUV39H1, also known asKMT1A), euchromatic histone lysine methyltransferase 2 (G9A, also knownas KMT1C and EHMT2), SUV39H2, ESET/SETDB1, and the like, SET1A, SET1B,MLL1 to 5, ASH1, SYMD2, NSD1, DOT1L, Pr-SET7/8, SUV4-20H1, EZH2, RIZ1),demethylase activity such as that provided by a histone demethylase(e.g., Lysine Demethylase 1A (KDM1A also known as LSD1), JHDM2a/b,JMJD2A/JHDM3A, JMJD2B, JMJD2C/GASC1, JMJD2D, JARID1A/RBP2,JARID1B/PLU-1, JARID1C/SMCX, JARID1D/SMCY, UTX, JMJD3, and the like),acetyltransferase activity such as that provided by a histone acetylasetransferase (e.g., catalytic core/fragment of the humanacetyltransferase p300, GCN5, PCAF, CBP, TAF1, TIP60/PLIP, MOZ/MYST3,MORF/MYST4, HBO1/MYST2, HMOF/MYST1, SRC1, ACTR, P160, CLOCK, and thelike), deacetylase activity such as that provided by a histonedeacetylase (e.g., HDAC1, HDAC2, HDAC3, HDAC8, HDAC4, HDAC5, HDAC7,HDAC9, SIRT1, SIRT2, HDAC11, and the like), kinase activity, phosphataseactivity, ubiquitin ligase activity, deubiquitinating activity,adenylation activity, deadenylation activity, SUMOylating activity,deSUMOylating activity, ribosylation activity, deribosylation activity,myristoylation activity, and demyristoylation activity.

Additional examples of a suitable fusion partners are dihydrofolatereductase (DHFR) destabilization domain (e.g., to generate a chemicallycontrollable chimeric CasZ protein), and a chloroplast transit peptide.Suitable chloroplast transit peptides include, but are not limited to:

(SEQ ID NO: 101) MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDITSITSNGGRVKCMQVWPPIGKKKFETLSYLPPLTRDSRA; (SEQ ID NO: 102)MASMISSSAVTTVSRASRGQSAAMAPFGGLKSMTGFPVRKVNTDIT SITSNGGRVKS;(SEQ ID NO: 103) MASSMLSSATMVASPAQATMVAPFNGLKSSAAFPATRKANNDITSITSNGGRVNCMQVWPPIEKKKFETLSYLPDLTDSGGRVNC; (SEQ ID NO: 104)MAQVSRICNGVQNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC; (SEQ ID NO: 105)MAQVSRICNGVWNPSLISNLSKSSQRKSPLSVSLKTQQHPRAYPISSSWGLKKSGMTLIGSELRPLKVMSSVSTAC; (SEQ ID NO: 106)MAQINNMAQGIQTLNPNSNFHKPQVPKSSSFLVFGSKKLKNSANSMLVLKKDSIFMQLFCSFRISASVATAC; (SEQ ID NO: 107)MAALVTSQLATSGTVLSVTDRFRRPGFQGLRPRNPADAALGMRTVG ASAAPKQSRKPHRFDRRCLSMVV;(SEQ ID NO: 108) MAALTTSQLATSATGFGIADRSAPSSLLRHGFQGLKPRSPAGGDATSLSVTTSARATPKQQRSVQRGSRRFPSVVVC; (SEQ ID NO: 109)MASSVLSSAAVATRSNVAQANMVAPFTGLKSAASFPVSRKQNLDIT SIASNGGRVQC;(SEQ ID NO: 110) MESLAATSVFAPSRVAVPAARALVRAGTVVPTRRTSSTSGTSGVKCSAAVTPQASPVISRSAAAA; and (SEQ ID NO: 111)MGAAATSMQSLKFSNRLVPPSRRLSPVPNNVTCNNLPKSAAPVRTVKCCASSWNSTINGAAATTNGASAASS.

In some case, a CasZ fusion polypeptide of the present disclosurecomprises: a) a CasZ polypeptide of the present disclosure; and b) achloroplast transit peptide. Thus, for example, a CRISPR-CasZ complexcan be targeted to the chloroplast. In some cases, this targeting may beachieved by the presence of an N-terminal extension, called achloroplast transit peptide (CTP) or plastid transit peptide.Chromosomal transgenes from bacterial sources must have a sequenceencoding a CTP sequence fused to a sequence encoding an expressedpolypeptide if the expressed polypeptide is to be compartmentalized inthe plant plastid (e.g. chloroplast). Accordingly, localization of anexogenous polypeptide to a chloroplast is often 1 accomplished by meansof operably linking a polynucleotide sequence encoding a CTP sequence tothe 5′ region of a polynucleotide encoding the exogenous polypeptide.The CTP is removed in a processing step during translocation into theplastid. Processing efficiency may, however, be affected by the aminoacid sequence of the CTP and nearby sequences at the NH 2 terminus ofthe peptide. Other options for targeting to the chloroplast which havebeen described are the maize cab-m7 signal sequence (U.S. Pat. No.7,022,896, WO 97/41228) a pea glutathione reductase signal sequence (WO97/41228) and the CTP described in US2009029861.

In some cases, a CasZ fusion polypeptide of the present disclosure cancomprise: a) a CasZ polypeptide of the present disclosure; and b) anendosomal escape peptide. In some cases, an endosomal escape polypeptidecomprises the amino acid sequence GLFXALLXLLXSLWXLLLXA (SEQ ID NO: 112),wherein each X is independently selected from lysine, histidine, andarginine. In some cases, an endosomal escape polypeptide comprises theamino acid sequence GLFHALLHLLHSLWHLLLHA (SEQ ID NO: 113).

For examples of some of the above fusion partners (and more) used in thecontext of fusions with Cas9, Zinc Finger, and/or TALE proteins (forsite specific target nucleic modification, modulation of transcription,and/or target protein modification, e.g., histone modification), see,e.g.: Nomura et al, J Am Chem Soc. 2007 Jul. 18; 129(28):8676-7;Rivenbark et al., Epigenetics. 2012 April; 7(4):350-60; Nucleic AcidsRes. 2016 Jul. 8; 44(12):5615-28; Gilbert et al., Cell. 2013 Jul. 18;154(2):442-51; Kearns et al., Nat Methods. 2015 May; 12(5):401-3;Mendenhall et al., Nat Biotechnol. 2013 December; 31(12):1133-6; Hiltonet al., Nat Biotechnol. 2015 May; 33(5):510-7; Gordley et al., Proc NatlAcad Sci USA. 2009 Mar. 31; 106(13):5053-8; Akopian et al., Proc NatlAcad Sci USA. 2003 Jul. 22; 100(15):8688-91; Tan et., al., J Virol. 2006February; 80(4):1939-48; Tan et al., Proc Natl Acad Sci USA. 2003 Oct.14; 100(21):11997-2002; Papworth et al., Proc Natl Acad Sci USA. 2003Feb. 18; 100(4):1621-6; Sanjana et al., Nat Protoc. 2012 Jan. 5;7(1):171-92; Beerli et al., Proc Natl Acad Sci USA. 1998 Dec. 8;95(25):14628-33; Snowden et al., Curr Biol. 2002 Dec. 23;12(24):2159-66; Xu et. al., Xu et al., Cell Discov. 2016 May 3; 2:16009;Komor et al., Nature. 2016 Apr. 20; 533(7603):420-4; Chaikind et al.,Nucleic Acids Res. 2016 Aug. 11; Choudhury at. al., Oncotarget. 2016Jun. 23; Du et al., Cold Spring Harb Protoc. 2016 Jan. 4; Pham et al.,Methods Mol Biol. 2016; 1358:43-57; Balboa et al., Stem Cell Reports.2015 Sep. 8; 5(3):448-59; Hara et al., Sci Rep. 2015 Jun. 9; 5:11221;Piatek et al., Plant Biotechnol J. 2015 May; 13(4):578-89; Hu et al.,Nucleic Acids Res. 2014 April; 42(7):4375-90; Cheng et al., Cell Res.2013 October; 23(10):1163-71; and Maeder et al., Nat Methods. 2013October; 10(10):977-9.

Additional suitable heterologous polypeptides include, but are notlimited to, a polypeptide that directly and/or indirectly provides forincreased transcription and/or translation of a target nucleic acid(e.g., a transcription activator or a fragment thereof, a protein orfragment thereof that recruits a transcription activator, a smallmolecule/drug-responsive transcription and/or translation regulator, atranslation-regulating protein, etc.). Non-limiting examples ofheterologous polypeptides to accomplish increased or decreasedtranscription include transcription activator and transcriptionrepressor domains. In some such cases, a chimeric CasZ polypeptide istargeted by the guide nucleic acid (guide RNA) to a specific location(i.e., sequence) in the target nucleic acid and exerts locus-specificregulation such as blocking RNA polymerase binding to a promoter (whichselectively inhibits transcription activator function), and/or modifyingthe local chromatin status (e.g., when a fusion sequence is used thatmodifies the target nucleic acid or modifies a polypeptide associatedwith the target nucleic acid). In some cases, the changes are transient(e.g., transcription repression or activation). In some cases, thechanges are inheritable (e.g., when epigenetic modifications are made tothe target nucleic acid or to proteins associated with the targetnucleic acid, e.g., nucleosomal histones).

Non-limiting examples of heterologous polypeptides for use whentargeting ssRNA target nucleic acids include (but are not limited to):splicing factors (e.g., RS domains); protein translation components(e.g., translation initiation, elongation, and/or release factors; e.g.,eIF4G); RNA methylases; RNA editing enzymes (e.g., RNA deaminases, e.g.,adenosine deaminase acting on RNA (ADAR), including A to I and/or C to Uediting enzymes); helicases; RNA-binding proteins; and the like. It isunderstood that a heterologous polypeptide can include the entireprotein or in some cases can include a fragment of the protein (e.g., afunctional domain).

The heterologous polypeptide of a subject chimeric CasZ polypeptide canbe any domain capable of interacting with ssRNA (which, for the purposesof this disclosure, includes intramolecular and/or intermolecularsecondary structures, e.g., double-stranded RNA duplexes such ashairpins, stem-loops, etc.), whether transiently or irreversibly,directly or indirectly, including but not limited to an effector domainselected from the group comprising; Endonucleases (for example RNaseIII, the CRR22 DYW domain, Dicer, and PIN (PilT N-terminus) domains fromproteins such as SMG5 and SMG6); proteins and protein domainsresponsible for stimulating RNA cleavage (for example CPSF, CstF, CFImand CFIIm); Exonucleases (for example XRN-1 or Exonuclease T);Deadenylases (for example HNT3); proteins and protein domainsresponsible for nonsense mediated RNA decay (for example UPF1, UPF2,UPF3, UPF3b, RNP S1, Y14, DEK, REF2, and SRm160); proteins and proteindomains responsible for stabilizing RNA (for example PABP); proteins andprotein domains responsible for repressing translation (for example Ago2and Ago4); proteins and protein domains responsible for stimulatingtranslation (for example Staufen); proteins and protein domainsresponsible for (e.g., capable of) modulating translation (e.g.,translation factors such as initiation factors, elongation factors,release factors, etc., e.g., eIF4G); proteins and protein domainsresponsible for polyadenylation of RNA (for example PAP1, GLD-2, andStar-PAP); proteins and protein domains responsible forpolyuridinylation of RNA (for example CI D1 and terminal uridylatetransferase); proteins and protein domains responsible for RNAlocalization (for example from IMP1, ZBP1, She2p, She3p, andBicaudal-D); proteins and protein domains responsible for nuclearretention of RNA (for example Rrp6); proteins and protein domainsresponsible for nuclear export of RNA (for example TAP, NXF1, THO, TREX,REF, and Aly); proteins and protein domains responsible for repressionof RNA splicing (for example PTB, Sam68, and hnRNP A1); proteins andprotein domains responsible for stimulation of RNA splicing (for exampleSerine/Arginine-rich (SR) domains); proteins and protein domainsresponsible for reducing the efficiency of transcription (for exampleFUS (TLS)); and proteins and protein domains responsible for stimulatingtranscription (for example CDK7 and HIV Tat). Alternatively, theeffector domain may be selected from the group comprising Endonucleases;proteins and protein domains capable of stimulating RNA cleavage;Exonucleases; Deadenylases; proteins and protein domains having nonsensemediated RNA decay activity; proteins and protein domains capable ofstabilizing RNA; proteins and protein domains capable of repressingtranslation; proteins and protein domains capable of stimulatingtranslation; proteins and protein domains capable of modulatingtranslation (e.g., translation factors such as initiation factors,elongation factors, release factors, etc., e.g., eIF4G); proteins andprotein domains capable of polyadenylation of RNA; proteins and proteindomains capable of polyuridinylation of RNA; proteins and proteindomains having RNA localization activity; proteins and protein domainscapable of nuclear retention of RNA; proteins and protein domains havingRNA nuclear export activity; proteins and protein domains capable ofrepression of RNA splicing; proteins and protein domains capable ofstimulation of RNA splicing; proteins and protein domains capable ofreducing the efficiency of transcription; and proteins and proteindomains capable of stimulating transcription. Another suitableheterologous polypeptide is a PUF RNA-binding domain, which is describedin more detail in WO2012068627, which is hereby incorporated byreference in its entirety.

Some RNA splicing factors that can be used (in whole or as fragmentsthereof) as heterologous polypeptides for a chimeric CasZ polypeptidehave modular organization, with separate sequence-specific RNA bindingmodules and splicing effector domains. For example, members of theSerine/Arginine-rich (SR) protein family contain N-terminal RNArecognition motifs (RRMs) that bind to exonic splicing enhancers (ESEs)in pre-mRNAs and C-terminal RS domains that promote exon inclusion. Asanother example, the hnRNP protein hnRNP Al binds to exonic splicingsilencers (ESSs) through its RRM domains and inhibits exon inclusionthrough a C-terminal Glycine-rich domain. Some splicing factors canregulate alternative use of splice site (ss) by binding to regulatorysequences between the two alternative sites. For example, ASF/SF2 canrecognize ESEs and promote the use of intron proximal sites, whereashnRNP Al can bind to ESSs and shift splicing towards the use of introndistal sites. One application for such factors is to generate ESFs thatmodulate alternative splicing of endogenous genes, particularly diseaseassociated genes. For example, Bcl-x pre-mRNA produces two splicingisoforms with two alternative 5′ splice sites to encode proteins ofopposite functions. The long splicing isoform Bcl-xL is a potentapoptosis inhibitor expressed in long-lived postmitotic cells and isup-regulated in many cancer cells, protecting cells against apoptoticsignals. The short isoform Bcl-xS is a pro-apoptotic isoform andexpressed at high levels in cells with a high turnover rate (e.g.,developing lymphocytes). The ratio of the two Bcl-x splicing isoforms isregulated by multiple cw-elements that are located in either the coreexon region or the exon extension region (i.e., between the twoalternative 5′ splice sites). For more examples, see WO2010075303, whichis hereby incorporated by reference in its entirety.

Further suitable fusion partners include, but are not limited to,proteins (or fragments thereof) that are boundary elements (e.g., CTCF),proteins and fragments thereof that provide periphery recruitment (e.g.,Lamin A, Lamin B, etc.), protein docking elements (e.g., FKBP/FRB,Pil1/Aby1, etc.).

Examples of various additional suitable heterologous polypeptide (orfragments thereof) for a subject chimeric CasZ polypeptide include, butare not limited to those described in the following applications (whichpublications are related to other CRISPR endonucleases such as Cas9, butthe described fusion partners can also be used with CasZ instead): PCTpatent applications: WO2010075303, WO2012068627, and WO2013155555, andcan be found, for example, in U.S. patents and patent applications: U.S.Pat. Nos. 8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445;8,865,406; 8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753;20140179006; 20140179770; 20140186843; 20140186919; 20140186958;20140189896; 20140227787; 20140234972; 20140242664; 20140242699;20140242700; 20140242702; 20140248702; 20140256046; 20140273037;20140273226; 20140273230; 20140273231; 20140273232; 20140273233;20140273234; 20140273235; 20140287938; 20140295556; 20140295557;20140298547; 20140304853; 20140309487; 20140310828; 20140310830;20140315985; 20140335063; 20140335620; 20140342456; 20140342457;20140342458; 20140349400; 20140349405; 20140356867; 20140356956;20140356958; 20140356959; 20140357523; 20140357530; 20140364333; and20140377868; all of which are hereby incorporated by reference in theirentirety.

In some cases, a heterologous polypeptide (a fusion partner) providesfor subcellular localization, i.e., the heterologous polypeptidecontains a subcellular localization sequence (e.g., a nuclearlocalization signal (NLS) for targeting to the nucleus, a sequence tokeep the fusion protein out of the nucleus, e.g., a nuclear exportsequence (NES), a sequence to keep the fusion protein retained in thecytoplasm, a mitochondrial localization signal for targeting to themitochondria, a chloroplast localization signal for targeting to achloroplast, an ER retention signal, and the like). In some embodiments,a CasZ fusion polypeptide does not include a NLS so that the protein isnot targeted to the nucleus (which can be advantageous, e.g., when thetarget nucleic acid is an RNA that is present in the cyosol). In someembodiments, the heterologous polypeptide can provide a tag (i.e., theheterologous polypeptide is a detectable label) for ease of trackingand/or purification (e.g., a fluorescent protein, e.g., greenfluorescent protein (GFP), YFP, RFP, CFP, mCherry, tdTomato, and thelike; a histidine tag, e.g., a 6×His tag; a hemagglutinin (HA) tag; aFLAG tag; a Myc tag; and the like).

In some cases a CasZ protein (e.g., a wild type CasZ protein, a variantCasZ protein, a chimeric CasZ protein, a dCasZ protein, a chimeric CasZprotein where the CasZ portion has reduced nuclease activity—such as adCasZ protein fused to a fusion partner, and the like) includes (isfused to) a nuclear localization signal (NLS) (e.g, in some cases 2 ormore, 3 or more, 4 or more, or 5 or more NLSs). Thus, in some cases, aCasZ polypeptide includes one or more NLSs (e.g., 2 or more, 3 or more,4 or more, or 5 or more NLSs). In some cases, one or more NLSs (2 ormore, 3 or more, 4 or more, or 5 or more NLSs) are positioned at or near(e.g., within 50 amino acids of) the N-terminus and/or the C-terminus.In some cases, one or more NLSs (2 or more, 3 or more, 4 or more, or 5or more NLSs) are positioned at or near (e.g., within 50 amino acids of)the N-terminus. In some cases, one or more NLSs (2 or more, 3 or more, 4or more, or 5 or more NLSs) are positioned at or near (e.g., within 50amino acids of) the C-terminus. In some cases, one or more NLSs (3 ormore, 4 or more, or 5 or more NLSs) are positioned at or near (e.g.,within 50 amino acids of) both the N-terminus and the C-terminus. Insome cases, an NLS is positioned at the N-terminus and an NLS ispositioned at the C-terminus.

In some cases a CasZ protein (e.g., a wild type CasZ protein, a variantCasZ protein, a chimeric CasZ protein, a dCasZ protein, a chimeric CasZprotein where the CasZ portion has reduced nuclease activity—such as adCasZ protein fused to a fusion partner, and the like) includes (isfused to) between 1 and 10 NLSs (e.g., 1-9, 1-8, 1-7, 1-6, 1-5, 2-10,2-9, 2-8, 2-7, 2-6, or 2-5 NLSs). In some cases a CasZ protein (e.g., awild type CasZ protein, a variant CasZ protein, a chimeric CasZ protein,a dCasZ protein, a chimeric CasZ protein where the CasZ portion hasreduced nuclease activity—such as a dCasZ protein fused to a fusionpartner, and the like) includes (is fused to) between 2 and 5 NLSs(e.g., 2-4, or 2-3 NLSs).

Non-limiting examples of NLSs include an NLS sequence derived from: theNLS of the SV40 virus large T-antigen, having the amino acid sequencePKKKRKV (SEQ ID NO: 114); the NLS from nucleoplasmin (e.g., thenucleoplasmin bipartite NLS with the sequence KRPAATKKAGQAKKKK (SEQ IDNO: 115)); the c-myc NLS having the amino acid sequence PAAKRVKLD (SEQID NO: 116) or RQRRNELKRSP (SEQ ID NO: 117); the hRNPA1 M9 NLS havingthe sequence NQSSNFGPMKGGNFGGRSSGPYGGGGQYFAKPRNQGGY (SEQ ID NO: 118);the sequence RMRIZFKNKGKDTAELRRRRVEVSVELRKAKKDEQILKRRNV (SEQ ID NO: 119)of the IBB domain from importin-alpha; the sequences VSRKRPRP (SEQ IDNO: 120) and PPKKARED (SEQ ID NO: 121) of the myoma T protein; thesequence PQPKKKPL (SEQ ID NO: 122) of human p53; the sequenceSALIKKKKKMAP (SEQ ID NO: 123) of mouse c-abl IV; the sequences DRLRR(SEQ ID NO: 124) and PKQKKRK (SEQ ID NO: 125) of the influenza virusNS1; the sequence RKLKKKIKKL (SEQ ID NO: 126) of the Hepatitis virusdelta antigen; the sequence REKKKFLKRR (SEQ ID NO: 127) of the mouse Mx1protein; the sequence KRKGDEVDGVDEVAKKKSKK (SEQ ID NO: 128) of the humanpoly(ADP-ribose) polymerase; and the sequence RKCLQAGMNLEARKTKK (SEQ IDNO: 129) of the steroid hormone receptors (human) glucocorticoid. Ingeneral, NLS (or multiple NLSs) are of sufficient strength to driveaccumulation of the CasZ protein in a detectable amount in the nucleusof a eukaryotic cell. Detection of accumulation in the nucleus may beperformed by any suitable technique. For example, a detectable markermay be fused to the CasZ protein such that location within a cell may bevisualized. Cell nuclei may also be isolated from cells, the contents ofwhich may then be analyzed by any suitable process for detectingprotein, such as immunohistochemistry, Western blot, or enzyme activityassay. Accumulation in the nucleus may also be determined indirectly.

In some cases, a CasZ fusion polypeptide includes a “ProteinTransduction Domain” or PTD (also known as a CPP—cell penetratingpeptide), which refers to a polypeptide, polynucleotide, carbohydrate,or organic or inorganic compound that facilitates traversing a lipidbilayer, micelle, cell membrane, organelle membrane, or vesiclemembrane. A PTD attached to another molecule, which can range from asmall polar molecule to a large macromolecule and/or a nanoparticle,facilitates the molecule traversing a membrane, for example going fromextracellular space to intracellular space, or cytosol to within anorganelle. In some embodiments, a PTD is covalently linked to the aminoterminus a polypeptide (e.g., linked to a wild type CasZ to generate afusino protein, or linked to a variant CasZ protein such as a dCasZ,nickase CasZ, or chimeric CasZ protein to generate a fusion protein). Insome embodiments, a PTD is covalently linked to the carboxyl terminus ofa polypeptide (e.g., linked to a wild type CasZ to generate a fusinoprotein, or linked to a variant CasZ protein such as a dCasZ, nickaseCasZ, or chimeric CasZ protein to generate a fusion protein). In somecases, the PTD is inserted interally in the CasZ fusion polypeptide(i.e., is not at the N- or C-terminus of the CasZ fusion polypeptide) ata suitable insertion site. In some cases, a subject CasZ fusionpolypeptide includes (is conjugated to, is fused to) one or more PTDs(e.g., two or more, three or more, four or more PTDs). In some cases, aPTD includes a nuclear localization signal (NLS) (e.g, in some cases 2or more, 3 or more, 4 or more, or 5 or more NLSs). Thus, in some cases,a CasZ fusion polypeptide includes one or more NLSs (e.g., 2 or more, 3or more, 4 or more, or 5 or more NLSs). In some cases, a PTD iscovalently linked to a nucleic acid (e.g., a CasZ guide nucleic acid, apolynucleotide encoding a CasZ guide nucleic acid, a polynucleotideencoding a CasZ fusion polypeptide, a donor polynucleotide, etc.).Examples of PTDs include but are not limited to a minimal undecapeptideprotein transduction domain (corresponding to residues 47-57 of HIV-1TAT comprising YGRKKRRQRRR; SEQ ID NO: 130); a polyarginine sequencecomprising a number of arginines sufficient to direct entry into a cell(e.g., 3, 4, 5, 6, 7, 8, 9, 10, or 10-50 arginines); a VP22 domain(Zender et al. (2002) Cancer Gene Ther. 9(6):489-96); an DrosophilaAntennapedia protein transduction domain (Noguchi et al. (2003) Diabetes52(7):1732-1737); a truncated human calcitonin peptide (Trehin et al.(2004) Pharm. Research 21:1248-1256); polylysine (Wender et al. (2000)Proc. Natl. Acad. Sci. USA 97:13003-13008); RRQRRTSKLMKR (SEQ ID NO:131); Transportan GWTLNSAGYLLGKINLKALAALAKKIL (SEQ ID NO: 132);KALAWEAKLAKALAKALAKHLAKALAKALKCEA (SEQ ID NO: 133); and RQIKIWFQNRRMKWKK(SEQ ID NO: 134). Exemplary PTDs include but are not limited to,YGRKKRRQRRR (SEQ ID NO: 130), RKKRRQRRR (SEQ ID NO: 135); an argininehomopolymer of from 3 arginine residues to 50 arginine residues;Exemplary PTD domain amino acid sequences include, but are not limitedto, any of the following: YGRKKRRQRRR (SEQ ID NO: 130); RKKRRQRR (SEQ IDNO: 136); YARAAARQARA (SEQ ID NO: 137); THRLPRRRRRR (SEQ ID NO: 138);and GGRRARRRRRR (SEQ ID NO: 139). In some embodiments, the PTD is anactivatable CPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June;1(5-6): 371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”)connected via a cleavable linker to a matching polyanion (e.g., Glu9 or“E9”), which reduces the net charge to nearly zero and thereby inhibitsadhesion and uptake into cells. Upon cleavage of the linker, thepolyanion is released, locally unmasking the polyarginine and itsinherent adhesiveness, thus “activating” the ACPP to traverse themembrane.

Linkers (e.g., for Fusion Partners)

In some instances, a subject CasZ protein is fused to a fusion partnervia a linker polypeptide (e.g., one or more linker polypeptides). Thelinker polypeptide may have any of a variety of amino acid sequences.Proteins can be joined by a spacer peptide, generally of a flexiblenature, although other chemical linkages are not excluded. Suitablelinkers include polypeptides of between 4 amino acids and 40 amino acidsin length, or between 4 amino acids and 25 amino acids in length. Theselinkers can be produced by using synthetic, linker-encodingoligonucleotides to couple the proteins, or can be encoded by a nucleicacid sequence encoding the fusion protein. Peptide linkers with a degreeof flexibility can be used. The linking peptides may have virtually anyamino acid sequence, bearing in mind that the preferred linkers willhave a sequence that results in a generally flexible peptide. The use ofsmall amino acids, such as glycine and alanine, are of use in creating aflexible peptide. The creation of such sequences is routine to those ofskill in the art. A variety of different linkers are commerciallyavailable and are considered suitable for use.

Examples of linker polypeptides include glycine polymers (G)_(n),glycine-serine polymers (including, for example, (GS)_(n), GSGGS_(n)(SEQ ID NO: 140), GGSGGS_(n) (SEQ ID NO: 141), and GGGS_(n) (SEQ ID NO:142), where n is an integer of at least one), glycine-alanine polymers,alanine-serine polymers. Exemplary linkers can comprise amino acidsequences including, but not limited to, GGSG (SEQ ID NO: 143), GGSGG(SEQ ID NO: 144), GSGSG (SEQ ID NO: 145), GSGGG (SEQ ID NO: 146), GGGSG(SEQ ID NO: 147), GSSSG (SEQ ID NO: 148), and the like. The ordinarilyskilled artisan will recognize that design of a peptide conjugated toany desired element can include linkers that are all or partiallyflexible, such that the linker can include a flexible linker as well asone or more portions that confer less flexible structure.

Detectable Labels

In some cases, a CasZ polypeptide of the present disclosure comprises(e.g., can be attached/fused to) a detectable label. Suitable detectablelabels and/or moieties that can provide a detectable signal can include,but are not limited to, an enzyme, a radioisotope, a member of aspecific binding pair; a fluorophore; a fluorescent protein; a quantumdot; and the like.

Suitable fluorescent proteins include, but are not limited to, greenfluorescent protein (GFP) or variants thereof, blue fluorescent variantof GFP (BFP), cyan fluorescent variant of GFP (CFP), yellow fluorescentvariant of GFP (YFP), enhanced GFP (EGFP), enhanced CFP (ECFP), enhancedYFP (EYFP), GFPS65T, Emerald, Topaz (TYFP), Venus, Citrine, mCitrine,GFPuv, destabilised EGFP (dEGFP), destabilised ECFP (dECFP),destabilised EYFP (dEYFP), mCFPm, Cerulean, T-Sapphire, CyPet, YPet,mKO, HcRed, t-HcRed, DsRed, DsRed2, DsRed-monomer, J-Red, dimer2,t-dimer2(12), mRFP1, pocilloporin, Renilla GFP, Monster GFP, paGFP,Kaede protein and kindling protein, Phycobiliproteins andPhycobiliprotein conjugates including B-Phycoerythrin, R-Phycoerythrinand Allophycocyanin. Other examples of fluorescent proteins includemHoneydew, mBanana, mOrange, dTomato, tdTomato, mTangerine, mStrawberry,mCherry, mGrape1, mRaspberry, mGrape2, mPlum (Shaner et al. (2005) Nat.Methods 2:905-909), and the like. Any of a variety of fluorescent andcolored proteins from Anthozoan species, as described in, e.g., Matz etal. (1999) Nature Biotechnol. 17:969-973, is suitable for use.

Suitable enzymes include, but are not limited to, horse radishperoxidase (HRP), alkaline phosphatase (AP), beta-galactosidase (GAL),glucose-6-phosphate dehydrogenase, beta-N-acetylglucosaminidase,β-glucuronidase, invertase, Xanthine Oxidase, firefly luciferase,glucose oxidase (GO), and the like.

Protospacer Adjacent Motif (PAM)

A natural CasZ protein binds to target DNA at a target sequence definedby the region of complementarity between the DNA-targeting RNA and thetarget DNA. As is the case for many CRISPR endonucleases, site-specificbinding (and/or cleavage) of a double stranded target DNA occurs atlocations determined by both (i) base-pairing complementarity betweenthe guide RNA and the target DNA; and (ii) a short motif [referred to asthe protospacer adjacent motif (PAM)] in the target DNA.

In some cases, the PAM for a CasZ protein is immediately 5′ of thetarget sequence of the non-complementary strand of the target DNA (alsoreferred to as the non-target strand; the complementary strandhybridizes to the guide sequence of the guide RNA while thenon-complementary strand does not directly hybridize with the guide RNAand is the reverse complement of the non-complementary strand). In somecases (e.g., for CasZc), the PAM sequence of the non-complementarystrand is 5′-TTA-3′. In some cases (e.g., for CasZb), the PAM sequenceof the non-complementary strand is 5′-TTTN-3′. In some cases (e.g., forCasZb), the PAM sequence of the non-complementary strand is 5′-TTTA-3′.

In some cases, different CasZ proteins (i.e., CasZ proteins from variousspecies) may be advantageous to use in the various provided methods inorder to capitalize on various enzymatic characteristics of thedifferent CasZ proteins (e.g., for different PAM sequence preferences;for increased or decreased enzymatic activity; for an increased ordecreased level of cellular toxicity; to change the balance betweenNHEJ, homology-directed repair, single strand breaks, double strandbreaks, etc.; to take advantage of a short total sequence; and thelike). CasZ proteins from different species may require different PAMsequences in the target DNA. Thus, for a particular CasZ protein ofchoice, the PAM sequence preference may be different than thesequence(s) described above. Various methods (including in silico and/orwet lab methods) for identification of the appropriate PAM sequence areknown in the art and are routine, and any convenient method can be used.

CasZ Guide RNA

A nucleic acid molecule that binds to a CasZ protein, forming aribonucleoprotein complex (RNP), and targets the complex to a specificlocation within a target nucleic acid (e.g., a target DNA) is referredto herein as a “CasZ guide RNA” or simply as a “guide RNA.” It is to beunderstood that in some cases, a hybrid DNA/RNA can be made such that aCasZ guide RNA includes DNA bases in addition to RNA bases, but the term“CasZ guide RNA” is still used to encompass such a molecule herein.

A CasZ guide RNA can be said to include two segments, a targetingsegment and a protein-binding segment. The targeting segment of a CasZguide RNA includes a nucleotide sequence (a guide sequence) that iscomplementary to (and therefore hybridizes with) a specific sequence (atarget site) within a target nucleic acid (e.g., a target ssRNA, atarget ssDNA, the complementary strand of a double stranded target DNA,etc.). The protein-binding segment (or “protein-binding sequence”)interacts with (binds to) a CasZ polypeptide. The protein-bindingsegment of a subject CasZ guide RNA includes two complementary stretchesof nucleotides that hybridize to one another to form a double strandedRNA duplex (dsRNA duplex). Site-specific binding and/or cleavage of atarget nucleic acid (e.g., genomic DNA) can occur at locations (e.g.,target sequence of a target locus) determined by base-pairingcomplementarity between the CasZ guide RNA (the guide sequence of theCasZ guide RNA) and the target nucleic acid.

A CasZ guide RNA and a CasZ protein, e.g., a fusion CasZ polypeptide,form a complex (e.g., bind via non-covalent interactions). The CasZguide RNA provides target specificity to the complex by including atargeting segment, which includes a guide sequence (a nucleotidesequence that is complementary to a sequence of a target nucleic acid).The CasZ protein of the complex provides the site-specific activity(e.g., cleavage activity provided by the CasZ protein and/or an activityprovided by the fusion partner in the case of a chimeric CasZ protein).In other words, the CasZ protein is guided to a target nucleic acidsequence (e.g. a target sequence) by virtue of its association with theCasZ guide RNA.

The “guide sequence” also referred to as the “targeting sequence” of aCasZ guide RNA can be modified so that the CasZ guide RNA can target aCasZ protein (e.g., a naturally occurring CasZ protein, a fusion CasZpolypeptide (chimeric CasZ), and the like) to any desired sequence ofany desired target nucleic acid, with the exception (e.g., as describedherein) that the PAM sequence can be taken into account. Thus, forexample, a CasZ guide RNA can have a guide sequence with complementarityto (e.g., can hybridize to) a sequence in a nucleic acid in a eukaryoticcell, e.g., a viral nucleic acid, a eukaryotic nucleic acid (e.g., aeukaryotic chromosome, chromosomal sequence, a eukaryotic RNA, etc.),and the like.

In some cases, a CasZ guide RNA has a length of 30 nucleotides (nt) ormore (e.g., 35 nt or more, 40 nt or more, 45 nt or more, 50 nt or more,55 nt or more, or 60 nt or more). In some embodiments, a CasZ guide RNAhas a length of 40 nucleotides (nt) or more (e.g., 45 nt or more, 50 ntor more, 55 nt or more, or 60 nt or more). In some cases, a CasZ guideRNA has a length of from 30 nucleotides (nt) to 100 nt (e.g., 30-90,30-80, 30-75, 30-70, 30-65, 40-100, 40-90, 40-80, 40-75, 40-70, or 40-65nt). In some cases, a CasZ guide RNA has a length of from 40 nucleotides(nt) to 100 nt (e.g., 40-90, 40-80, 40-75, 40-70, or 40-65 nt).

Guide Sequence of a CasZ Guide RNA

A subject CasZ guide RNA includes a guide sequence (i.e., a targetingsequence), which is a nucleotide sequence that is complementary to asequence (a target site) in a target nucleic acid. In other words, theguide sequence of a CasZ guide RNA can interact with a target nucleicacid (e.g., double stranded DNA (dsDNA), single stranded DNA (ssDNA),single stranded RNA (ssRNA), or double stranded RNA (dsRNA)) in asequence-specific manner via hybridization (i.e., base pairing). Theguide sequence of a CasZ guide RNA can be modified (e.g., by geneticengineering)/designed to hybridize to any desired target sequence (e.g.,while taking the PAM into account, e.g., when targeting a dsDNA target)within a target nucleic acid (e.g., a eukaryotic target nucleic acidsuch as genomic DNA).

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 60% or more (e.g., 65%or more, 70% or more, 75% or more, 80% or more, 85% or more, 90% ormore, 95% or more, 97% or more, 98% or more, 99% or more, or 100%). Insome cases, the percent complementarity between the guide sequence andthe target site of the target nucleic acid is 80% or more (e.g., 85% ormore, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more,or 100%). In some cases, the percent complementarity between the guidesequence and the target site of the target nucleic acid is 90% or more(e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%). Insome cases, the percent complementarity between the guide sequence andthe target site of the target nucleic acid is 100%.

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 100% over the sevencontiguous 3′-most nucleotides of the target site of the target nucleicacid.

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 60% or more (e.g., 70%or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% ormore, 97% or more, 98% or more, 99% or more, or 100%) over 17 or more(e.g., 18 or more, 19 or more, 20 or more, 21 or more, 22 or more)contiguous nucleotides. In some cases, the percent complementaritybetween the guide sequence and the target site of the target nucleicacid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% ormore, 98% or more, 99% or more, or 100%) over 17 or more (e.g., 18 ormore, 19 or more, 20 or more, 21 or more, 22 or more) contiguousnucleotides. In some cases, the percent complementarity between theguide sequence and the target site of the target nucleic acid is 90% ormore (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%)over 17 or more (e.g., 18 or more, 19 or more, 20 or more, 21 or more,22 or more) contiguous nucleotides. In some cases, the percentcomplementarity between the guide sequence and the target site of thetarget nucleic acid is 100% over 17 or more (e.g., 18 or more, 19 ormore, 20 or more, 21 or more, 22 or more) contiguous nucleotides.

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 60% or more (e.g., 70%or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% ormore, 97% or more, 98% or more, 99% or more, or 100%) over 19 or more(e.g., 20 or more, 21 or more, 22 or more) contiguous nucleotides. Insome cases, the percent complementarity between the guide sequence andthe target site of the target nucleic acid is 80% or more (e.g., 85% ormore, 90% or more, 95% or more, 97% or more, 98% or more, 99% or more,or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 or more)contiguous nucleotides. In some cases, the percent complementaritybetween the guide sequence and the target site of the target nucleicacid is 90% or more (e.g., 95% or more, 97% or more, 98% or more, 99% ormore, or 100%) over 19 or more (e.g., 20 or more, 21 or more, 22 ormore) contiguous nucleotides. In some cases, the percent complementaritybetween the guide sequence and the target site of the target nucleicacid is 100% over 19 or more (e.g., 20 or more, 21 or more, 22 or more)contiguous nucleotides.

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 60% or more (e.g., 70%or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% ormore, 97% or more, 98% or more, 99% or more, or 100%) over 17-25contiguous nucleotides. In some cases, the percent complementaritybetween the guide sequence and the target site of the target nucleicacid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% ormore, 98% or more, 99% or more, or 100%) over 17-25 contiguousnucleotides. In some cases, the percent complementarity between theguide sequence and the target site of the target nucleic acid is 90% ormore (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%)over 17-25 contiguous nucleotides. In some cases, the percentcomplementarity between the guide sequence and the target site of thetarget nucleic acid is 100% over 17-25 contiguous nucleotides.

In some cases, the percent complementarity between the guide sequenceand the target site of the target nucleic acid is 60% or more (e.g., 70%or more, 75% or more, 80% or more, 85% or more, 90% or more, 95% ormore, 97% or more, 98% or more, 99% or more, or 100%) over 19-25contiguous nucleotides. In some cases, the percent complementaritybetween the guide sequence and the target site of the target nucleicacid is 80% or more (e.g., 85% or more, 90% or more, 95% or more, 97% ormore, 98% or more, 99% or more, or 100%) over 19-25 contiguousnucleotides. In some cases, the percent complementarity between theguide sequence and the target site of the target nucleic acid is 90% ormore (e.g., 95% or more, 97% or more, 98% or more, 99% or more, or 100%)over 19-25 contiguous nucleotides. In some cases, the percentcomplementarity between the guide sequence and the target site of thetarget nucleic acid is 100% over 19-25 contiguous nucleotides.

In some cases, the guide sequence has a length in a range of from 17-30nucleotides (nt) (e.g., from 17-25, 17-22, 17-20, 19-30, 19-25, 19-22,19-20, 20-30, 20-25, or 20-22 nt). In some cases, the guide sequence hasa length in a range of from 17-25 nucleotides (nt) (e.g., from 17-22,17-20, 19-25, 19-22, 19-20, 20-25, or 20-22 nt). In some cases, theguide sequence has a length of 17 or more nt (e.g., 18 or more, 19 ormore, 20 or more, 21 or more, or 22 or more nt; 19 nt, 20 nt, 21 nt, 22nt, 23 nt, 24 nt, 25 nt, etc.). In some cases, the guide sequence has alength of 19 or more nt (e.g., 20 or more, 21 or more, or 22 or more nt;19 nt, 20 nt, 21 nt, 22 nt, 23 nt, 24 nt, 25 nt, etc.). In some cases,the guide sequence has a length of 17 nt. In some cases, the guidesequence has a length of 18 nt. In some cases, the guide sequence has alength of 19 nt. In some cases, the guide sequence has a length of 20nt. In some cases, the guide sequence has a length of 21 nt. In somecases, the guide sequence has a length of 22 nt. In some cases, theguide sequence has a length of 23 nt.

Protein-Binding Segment of a CasZ Guide RNA

The protein-binding segment of a subject CasZ guide RNA interacts with aCasZ protein. The CasZ guide RNA guides the bound CasZ protein to aspecific nucleotide sequence within target nucleic acid via theabove-mentioned guide sequence. The protein-binding segment of a CasZguide RNA comprises two stretches of nucleotides that are complementaryto one another and hybridize to form a double stranded RNA duplex (dsRNAduplex). Thus, the protein-binding segment includes a dsRNA duplex.

In some cases, the dsRNA duplex region includes a range of from 5-25base pairs (bp) (e.g., from 5-22, 5-20, 5-18, 5-15, 5-12, 5-10, 5-8,8-25, 8-22, 8-18, 8-15, 8-12, 12-25, 12-22, 12-18, 12-15, 13-25, 13-22,13-18, 13-15, 14-25, 14-22, 14-18, 14-15, 15-25, 15-22, 15-18, 17-25,17-22, or 17-18 bp, e.g., 5 bp, 6 bp, 7 bp, 8 bp, 9 bp, 10 bp, etc.). Insome cases, the dsRNA duplex region includes a range of from 6-15 basepairs (bp) (e.g., from 6-12, 6-10, or 6-8 bp, e.g., 6 bp, 7 bp, 8 bp, 9bp, 10 bp, etc.). In some cases, the duplex region includes 5 or more bp(e.g., 6 or more, 7 or more, or 8 or more bp). In some cases, the duplexregion includes 6 or more bp (e.g., 7 or more, or 8 or more bp). In somecases, not all nucleotides of the duplex region are paired, andtherefore the duplex forming region can include a bulge. The term“bulge” herein is used to mean a stretch of nucleotides (which can beone nucleotide or multiple nucleotides) that do not contribute to adouble stranded duplex, but which are surround 5′ and 3′ by nucleotidesthat do contribute, and as such a bulge is considered part of the duplexregion. In some cases, the dsRNA includes 1 or more bulges (e.g., 2 ormore, 3 or more, 4 or more bulges). In some cases, the dsRNA duplexincludes 2 or more bulges (e.g., 3 or more, 4 or more bulges). In somecases, the dsRNA duplex includes 1-5 bulges (e.g., 1-4, 1-3, 2-5, 2-4,or 2-3 bulges).

Thus, in some cases, the stretches of nucleotides that hybridize to oneanother to form the dsRNA duplex have 70%-100% complementarity (e.g.,75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) withone another. In some cases, the stretches of nucleotides that hybridizeto one another to form the dsRNA duplex have 70%-100% complementarity(e.g., 75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity)with one another. In some cases, the stretches of nucleotides thathybridize to one another to form the dsRNA duplex have 85%-100%complementarity (e.g., 90%-100%, 95%-100% complementarity) with oneanother. In some cases, the stretches of nucleotides that hybridize toone another to form the dsRNA duplex have 70%-95% complementarity (e.g.,75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with one another.

In other words, in some embodiments, the dsRNA duplex includes twostretches of nucleotides that have 70%-100% complementarity (e.g.,75%-100%, 80%-10%, 85%-100%, 90%-100%, 95%-100% complementarity) withone another. In some cases, the dsRNA duplex includes two stretches ofnucleotides that have 85%-100% complementarity (e.g., 90%-100%, 95%-100%complementarity) with one another. In some cases, the dsRNA duplexincludes two stretches of nucleotides that have 70%-95% complementarity(e.g., 75%-95%, 80%-95%, 85%-95%, 90%-95% complementarity) with oneanother.

The duplex region of a subject CasZ guide RNA can include one or more(1, 2, 3, 4, 5, etc) mutations relative to a naturally occurring duplexregion. For example, in some cases a base pair can be maintained whilethe nucleotides contributing to the base pair from each segment can bedifferent. In some cases, the duplex region of a subject CasZ guide RNAincludes more paired bases, less paired bases, a smaller bulge, a largerbulge, fewer bulges, more bulges, or any convenient combination thereof,as compared to a naturally occurring duplex region (of a naturallyoccurring CasZ guide RNA).

Examples of various Cas9 guide RNAs and cpf1 guide RNAs can be found inthe art, and in some cases variations similar to those introduced intoCas9 guide RNAs can also be introduced into CasZ guide RNAs of thepresent disclosure (e.g., mutations to the dsRNA duplex region,extension of the 5′ or 3′ end for added stability for to provide forinteraction with another protein, and the like). For example, see Jineket al., Science. 2012 Aug. 17; 337(6096):816-21; Chylinski et al., RNABiol. 2013 May; 10(5):726-37; Ma et al., Biomed Res Int. 2013;2013:270805; Hou et al., Proc Natl Acad Sci USA. 2013 Sep. 24;110(39):15644-9; Jinek et al., Elife. 2013; 2:e00471; Pattanayak et al.,Nat Biotechnol. 2013 September; 31(9):839-43; Qi et al, Cell. 2013 Feb.28; 152(5):1173-83; Wang et al., Cell. 2013 May 9; 153(4):910-8; Auer etal., Genome Res. 2013 Oct. 31; Chen et al., Nucleic Acids Res. 2013 Nov.1; 41(20):e19; Cheng et al., Cell Res. 2013 October; 23(10):1163-71; Choet al., Genetics. 2013 November; 195(3):1177-80; DiCarlo et al., NucleicAcids Res. 2013 April; 41(7):4336-43; Dickinson et al., Nat Methods.2013 October; 10(10):1028-34; Ebina et al., Sci Rep. 2013; 3:2510; Fujiiet. al, Nucleic Acids Res. 2013 Nov. 1; 41(20):e187; Hu et al., CellRes. 2013 November; 23(11):1322-5; Jiang et al., Nucleic Acids Res. 2013Nov. 1; 41(20):e188; Larson et al., Nat Protoc. 2013 November;8(11):2180-96; Mali et. at., Nat Methods. 2013 October; 10(10):957-63;Nakayama et al., Genesis. 2013 December; 51(12):835-43; Ran et al., NatProtoc. 2013 November; 8(11):2281-308; Ran et al., Cell. 2013 Sep. 12;154(6):1380-9; Upadhyay et al., G3 (Bethesda). 2013 Dec. 9;3(12):2233-8; Walsh et al., Proc Natl Acad Sci USA. 2013 Sep. 24;110(39):15514-5; Xie et al., Mol Plant. 2013 Oct. 9; Yang et al., Cell.2013 Sep. 12; 154(6):1370-9; Briner et al., Mol Cell. 2014 Oct. 23;56(2):333-9; and U.S. patents and patent applications: U.S. Pat. Nos.8,906,616; 8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406;8,795,965; 8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006;20140179770; 20140186843; 20140186919; 20140186958; 20140189896;20140227787; 20140234972; 20140242664; 20140242699; 20140242700;20140242702; 20140248702; 20140256046; 20140273037; 20140273226;20140273230; 20140273231; 20140273232; 20140273233; 20140273234;20140273235; 20140287938; 20140295556; 20140295557; 20140298547;20140304853; 20140309487; 20140310828; 20140310830; 20140315985;20140335063; 20140335620; 20140342456; 20140342457; 20140342458;20140349400; 20140349405; 20140356867; 20140356956; 20140356958;20140356959; 20140357523; 20140357530; 20140364333; and 20140377868; allof which are hereby incorporated by reference in their entirety.

A CasZ guide RNA comprises both the guide sequence and two stretches(“duplex-forming segments”) of nucleotides that hybridize to form thedsRNA duplex of the protein-binding segment. The particular sequence ofa given CasZ guide RNA can be characteristic of the species in which acrRNA is found. Examples of suitable CasZ guide RNAs are providedherein.

Example Guide RNA Sequences

Repeat sequences (non-guide sequence portion of a CasZ guide RNA) ofcrRNAs for naturally existing CasZ proteins (e.g., see FIG. 1 and FIG.7) are shown in Table 1 and Table 3.

TABLE 1 crRNA repeat sequences for CasZ proteins CasZ SEQ ProteinRepeat sequence ID NO: Za.1 GTTGCATTCCTTCATTCGTCT  51 ATTCGGGTTCTGCAACZa.2 GTTGCATTCCTTCATTCGTCT  52 ATCCGGGTTCTGCAAG Za.3GTTGCAGAACCCGAATAGACG  53 AATGAAGGAATGCAAC Za.4 CTATCATATTCAGAACAAAGG 54 GATTAAGGAATGCAAC Za.5 CTTTCATACTCAGAACAAAGG  55 GATTAAGGAATGCAACZa.6 GTCTACAACTCATTGATAGAA  56 ATCAATGAGTTAGACA Za.7GTTATAAAGGCGGGGATCGCG  57 ACCGAGCGATTGAAAG Zb.1 GTTGCATTCCTTAATTCATTT 58 TCTCAATATCGGAAAC Zb.2 GTTGCAGAAATAGAATAAAGG  59 AATTAAGGAATGCAACZb.3 CTTTCATACTCAGAACAAAGG  55 GATTAAGGAATGCAAC Zb.4ATTTCATACTCAGAACAAAGG  61 GATTAAGGAATGCAAC Zb.5 GTTTCAGCGCACGAATTAACG 62 AGATGAGAGATGCAACT Zb.6 CTTGCAGAAGCTGAATAGACG  63 AATCAAGGAATGCAACZb.7 CACTTGCAGGCCTTGAATAGA  64 GGAGTTAAGGAATGCAAC Zb.8GTCTCCATGACTGAAAAGTCG  65 TGGCCGAATTGAAAC Zb.9 GTTGCAGCGCCCGAACTGACG  66AGACGAGAGATGCAAC Zb.10 GTTGCGCGAATAGAATAAAGG  67 AATTAAGGAATGCAAC Zb.11AGTTGCATTCCTTAATCCCTC  68 TGTTCAGTTTGTGCAAT Zc.1 GTTGCATTCCTAGTTTCTCTA 69 ATTAGCACTGTGCAAC Zc.2 GTTGCGGCGCGCGAATAAACG  70 AGACTAGGAATGCAACZc.3 ACTAGTTGCATTCCTTAATCC  71 CTTTGTTCTGAATATGCTAG Zc.4CTTTCATATTCAGAACAAAGG  72 GATTAAGGAATGCAAC Zc.5 GTTGCAGTCCTTAACCCCTAG 73 TTTCTGAATATGAAAGAT Zc.6 GTTGCAGCCCCCGAACTAACG  74 AGATGAGAGATGCAACZc.7 CTTGCAGAACAATCATATATG  75 ACTAATCAGACTGCAAC Zd.1GTTGCACTCACCGGTGCTCAC  76 GACGTAGGGATGCAAC Zd.2 GTCCCTACTCGCTAGGGAAAC 77 TAATTGAATGGAAAC Ze.1 GTTGCATTCGGGTGCAAAACA  78 GGGAGTAGAGTGTAAC Ze.2CTTCCAAACTCGAGCCAGTGG  79 GGAGAGAAGTGGCA Ze.3 CCTGTAGACCGGTCTCATTCT  80GAGAGGGGTATGCAACT Ze.4 GTCTCGAGACCCTACAGATTT  81 TGGAGAGGGGTGGGAC Ze.4bGTCCCACCCCTCTCCAAAATC  82 TGTAGGGTCTCGAGAC Zf.1 GTAGCAGGACTCTCCTCGAGA 83 GAAACAGGGGTATGCT Zf.2 GTACAATACCTCTCCTTTAAG  84 AGAGGGAGGGGTACGCTACZf.3 CCCCCTCGTTTCCTTCAGGGG  85 ATTCCTTTCC Zg.1 GGTTCCCCCGGGCGCGGGTGG  86GGTGGCG Zg.2 GGCTGCTCCGGGTGCGCGTGG  87 AGCGAGG Zh.1GTTTTATACCCTTTAGAATTT  88 AAACTGTCTAAAAG Zi.1 ATTGCACCGGCCAACGCAAAT  89CTGATTGATGGACAC Zi.2 GCCGCAGCGGCCGACGCGGCC  90 CTGATCGATGGACAC Zj.1GTCGAAATGCCCGCGCGGGGG  91 CGTCGTACCCGCGAC Zk.1 GGCTAGCCCGTGCGCGCAGGG  92ACGAGTGG Zk.2 GCCCGTGCGCGCAGGGACGAG  93 TGG Zk.3 GTTGCAGCGGCCGACGGAGCG 94 CGAGCGTGGATGCCAC Zk.4 CCATCGCCCCGCGCGCACGTG  95 GATGAGCC Zl.1CTTTAGACTTCTCCGGAAGTC  96 GAATTAATGGAAAC Zl.2 GGGCGCCCCGCGCGAGCGGGG  97GTTGAAG Za.8 CTTGCAGAACCCGGATAGACG 295 AATGAAGGAATGCAAC Zb.12CTTGCAGGCCTTGAATAGAGG 296 AGTTAAGGAATGCAAC Zb.13 GTTGCACAGTGCTAATTAGAG297 AAACTAGGAATGCAAC Zb.14 CTAGCATATTCAGAACAAAGG 298 GATTAAGGAATGCAACZb.15 CTTTCATATTCAGAAACTAGG 299 GGTTAAGGACTGCAAC Zc.8GTTGCATCCCTACGTCGTGAG 300 CACCGGTGAGTGCAAC Ze.5 GGAAAGGAATCCCCTGAAGGA301 AACGAGGGGG Zg.3 GTGTCCATCAATCAGATTTGC 302 GTTGGCCGGTGCAAT Zb.16GTTTCAGCGCACGAATTAACG 303 AGATGAGAGATGCAAC Zj.2 CTTTTAGACAGTTTAAATTCT307 AAAGGGTATAAAAC

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a crRNAsequence of Table 1 or Table 3. In some cases, a subject CasZ guide RNAcomprises a nucleotide sequence having 70% or more identity (e.g., 75%or more, 80% or more, 85% or more, 90% or more, 93% or more, 95% ormore, 97% or more, 98% or more, or 100% identity) with a crRNA sequenceof Table 1 or Table 3. In some cases, a subject CasZ guide RNA comprisesa nucleotide sequence having 80% or more identity (e.g., 85% or more,90% or more, 93% or more, 95% or more, 97% or more, 98% or more, or 100%identity) with a crRNA sequence of Table 1 or Table 3. In some cases, asubject CasZ guide RNA comprises a nucleotide sequence having 90% ormore identity (e.g., 93% or more, 95% or more, 97% or more, 98% or more,or 100% identity) with a crRNA sequence of Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZa,CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, or CasZi crRNA sequenceof Table 1 or Table 3. In some cases, a subject CasZ guide RNA comprisesa nucleotide sequence having 70% or more identity (e.g., 75% or more,80% or more, 85% or more, 90% or more, 93% or more, 95% or more, 97% ormore, 98% or more, or 100% identity) with a CasZa, CasZb, CasZc, CasZd,CasZe, CasZf, CasZg, CasZh, or CasZi crRNA sequence of Table 1 or Table3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 80% or more identity (e.g., 85% or more, 90% or more,93% or more, 95% or more, 97% or more, 98% or more, or 100% identity)with a CasZa, CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh, or CasZicrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 90% or more identity(e.g., 93% or more, 95% or more, 97% or more, 98% or more, or 100%identity) with a CasZa, CasZb, CasZc, CasZd, CasZe, CasZf, CasZg, CasZh,or CasZi crRNA sequence of Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZacrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZacrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZa crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZa crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZbcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZbcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZb crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZb crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZccrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZccrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZc crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZc crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZdcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZdcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZd crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZd crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZecrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZecrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZe crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZe crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZfcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZfcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZf crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZf crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZgcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZgcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZg crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZg crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZhcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZhcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZh crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZh crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZicrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZicrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZi crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZi crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZjcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZjcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZj crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZj crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZkcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZkcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZk crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZk crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZlcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 70% or more identity(e.g., 75% or more, 80% or more, 85% or more, 90% or more, 93% or more,95% or more, 97% or more, 98% or more, or 100% identity) with a CasZlcrRNA sequence of Table 1 or Table 3. In some cases, a subject CasZguide RNA comprises a nucleotide sequence having 80% or more identity(e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97% or more,98% or more, or 100% identity) with a CasZl crRNA sequence of Table 1 orTable 3. In some cases, a subject CasZ guide RNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZl crRNA sequenceof Table 1 or Table 3.

In some cases, a subject CasZ guide RNA comprises (e.g., in addition toa guide sequence, e.g., as part of the protein-binding region) a CasZj,CasZl, or CasZk crRNA sequence of Table 1 or Table 3. In some cases, asubject CasZ guide RNA comprises a nucleotide sequence having 70% ormore identity (e.g., 75% or more, 80% or more, 85% or more, 90% or more,93% or more, 95% or more, 97% or more, 98% or more, or 100% identity)with a CasZj, CasZl, or CasZk crRNA sequence of Table 1 or Table 3. Insome cases, a subject CasZ guide RNA comprises a nucleotide sequencehaving 80% or more identity (e.g., 85% or more, 90% or more, 93% ormore, 95% or more, 97% or more, 98% or more, or 100% identity) with aCasZj, CasZl, or CasZk crRNA sequence of Table 1 or Table 3. In somecases, a subject CasZ guide RNA comprises a nucleotide sequence having90% or more identity (e.g., 93% or more, 95% or more, 97% or more, 98%or more, or 100% identity) with a CasZj, CasZl, or CasZk crRNA sequenceof Table 1 or Table 3.

CasZ Transactivating Noncoding RNA (trancRNA)

Compositions and methods of the present disclosure include a CasZtransactivating noncoding RNA (“trancRNA”; also referred to herein as a“CasZ trancRNA”). In some cases, a trancRNA forms a complex with a CasZpolypeptide of the present disclosure and a CasZ guide RNA. A trancRNAcan be identified as a highly transcribed RNA encoded by a nucleotidesequence present in a CasZ locus. The sequence encoding a trancRNA isusually located between the cas genes and the array of the CasZ locus(the repeats) (e.g., can be located adjacent to the repeat sequences).Examples below demonstrate detection of a CasZ trancRNA. In some cases,a CasZ trancRNA co-immunoprecipitates (forms a complex with) with a CasZpolypeptide. In some cases, the presence of a CasZ trancRNA is requiredfor function of the system. Data related to trancRNAs (e.g., theirexpression and their location on naturally occurring arrays) ispresented in the examples section below.

In some cases, a CasZ trancRNA has a length of from 60 nucleotides (nt)to 270 nt (e.g., 60-260, 70-270, 70-260, or 75-255 nt). In some cases, aCasZ trancRNA (e.g., a CasZa trancRNA) has a length of from 60-150 nt(e.g., 60-140, 60-130, 65-150, 65-140, 65-130, 70-150, 70-140, or 70-130nt). In some cases, a CasZ trancRNA (e.g., a CasZa trancRNA) has alength of from 70-130 nt. In some cases, a CasZ trancRNA (e.g., a CasZatrancRNA) has a length of about 80 nt. In some cases, a CasZ trancRNA(e.g., a CasZa trancRNA) has a length of about 90 nt. In some cases, aCasZ trancRNA (e.g., a CasZa trancRNA) has a length of about 120 nt.

In some cases, a CasZ trancRNA (e.g., a CasZb trancRNA) has a length offrom 85-240 nt (e.g., 85-230, 85-220, 85-150, 85-130, 95-240, 95-230,95-220, 95-150, or 95-130 nt). In some cases, a CasZ trancRNA (e.g., aCasZb trancRNA) has a length of from 95-120 nt. In some cases, a CasZtrancRNA (e.g., a CasZb trancRNA) has a length of about 105 nt. In somecases, a CasZ trancRNA (e.g., a CasZb trancRNA) has a length of about115 nt. In some cases, a CasZ trancRNA (e.g., a CasZb trancRNA) has alength of about 215 nt.

In some cases, a CasZ trancRNA (e.g., a CasZc trancRNA) has a length offrom 80-275 nt (e.g., 85-260 nt). In some cases, a CasZ trancRNA (e.g.,a CasZc trancRNA) has a length of from 80-110 nt (e.g., 85-105 nt). Insome cases, a CasZ trancRNA (e.g., a CasZc trancRNA) has a length offrom 235-270 nt (e.g., 240-260 nt). In some cases, a CasZ trancRNA(e.g., a CasZc trancRNA) has a length of about 95 nt. In some cases, aCasZ trancRNA (e.g., a CasZc trancRNA) has a length of about 250 nt.

Example trancRNA Sequences

Examples of trancRNA sequences for naturally existing CasZ proteins areshown in Table 2.

TABLE 2 CasZ trancRNA sequences CasZ SEQ Protein trancRNA sequence ID NOZa.1 CGATTCCTCCCTACAGTAGTTAGGTAT 151 AGCCGAAAGGTAGAGACTAAATCTGTAGTTGGAGTGGGCCGCTTGCATCGGCC Za.2 TCGTCTCGAGGGTTACCAAAATTGGCA 152CTTCTCGACTTTAGGCCGATGCAAGCG GCCCACTCCACTACAGATTTAGTCTCTACCTTGCGGCTATACCTAACTTACTGT AGGGAGGAATCGTG Za.3CTTCACTGATAAAGTGGAGAACCGCTT 153 CACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAGTGAAGGTGGGCTGCTTGC ATCAGCCTAA Zb.2 CAGAATAATACTGACTTACTAAGATAT154 CTTGAGGGTATACCCGAAAAGATTGGC GTTGTTGCAACGCAATAAGATGTAAATCTGAAAAGGTTTGGAATCATATAAATA ATTTTA Zb.4 AAGCCAAGATATGGAATGCCATTGTAA 155TATTATGGTGTTGACTTAGTTTAGATT TAAACAATCTTCGATGGCTATATGCGGAAGGTTTGGCGTCGTTGTAACGC Zb.6 CAGTGTGCATAGCTATAACACTACGCA 156AAGACTGCTAAAGAGCGATGTGCTCTA TCGCAGTCTCACCTTTAATGGACTTACGGATCTTTTGGAGCACTAAGCTCCGCT GCGGTGCAACACCGCCCTTTTCTTGCCTCTGCTTGCCCTTTCCGGTTATTATAG CCGGGAGAGTGCGGAAGATTACCGCTCTAGCTCGCAGCATGTTACTGAGTC Zc.3 GCAAGTCATTCGGGGACACTTTTTGTT 157ATTTAAAGTGTTTTAGATAAATCAGTG TCATGCTGAATAACGACCCGACCTATA AATAACATAATCCZc.5 GTCCTTAAGGTACTACACATTACATGT 158 GAACGTGGAGCTAATAATAGAAATATTATTAGACTACACCTTATTAATAACGGT AGGAGATCTATATGGTCTTGAATGGAATAGTAATTGTGAAATTATAATTTCTGT TCTTAGCTACTTAAGATGGCTCGTTGCAAGCCACTCGGGGGCTCTCTTGAAGTC AAAGAGCTTTAGACAAATCAGTGTCAAACTGAATAACGACCCGACCATGACTTC ATAATCCCG

In some cases, a subject CasZ trancRNA comprises a CasZa trancRNAsequence above. In some cases, a subject CasZ trancRNA comprises anucleotide sequence having 70% or more identity (e.g., 75% or more, 80%or more, 85% or more, 90% or more, 93% or more, 95% or more, 97% ormore, 98% or more, or 100% identity) with a CasZa trancRNA sequenceabove. In some cases, a subject CasZ trancRNA comprises a nucleotidesequence having 80% or more identity (e.g., 85% or more, 90% or more,93% or more, 95% or more, 97% or more, 98% or more, or 100% identity)with a CasZa trancRNA sequence above. In some cases, a subject CasZtrancRNA comprises a nucleotide sequence having 90% or more identity(e.g., 93% or more, 95% or more, 97% or more, 98% or more, or 100%identity) with a CasZa trancRNA sequence above. In some cases, a subjectCasZ trancRNA comprises a nucleotide sequence having 80% or moreidentity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97%or more, 98% or more, or 100% identity) with a CasZa trancRNA sequenceabove, and has a length of from 60-150 nt (e.g., 60-140, 60-130, 65-150,65-140, 65-130, 70-150, 70-140, or 70-130 nt).

In some cases, a subject CasZ trancRNA comprises a CasZb trancRNAsequence above. In some cases, a subject CasZ trancRNA comprises anucleotide sequence having 70% or more identity (e.g., 75% or more, 80%or more, 85% or more, 90% or more, 93% or more, 95% or more, 97% ormore, 98% or more, or 100% identity) with a CasZb trancRNA sequenceabove. In some cases, a subject CasZ trancRNA comprises a nucleotidesequence having 80% or more identity (e.g., 85% or more, 90% or more,93% or more, 95% or more, 97% or more, 98% or more, or 100% identity)with a CasZb trancRNA sequence above. In some cases, a subject CasZtrancRNA comprises a nucleotide sequence having 90% or more identity(e.g., 93% or more, 95% or more, 97% or more, 98% or more, or 100%identity) with a CasZb trancRNA sequence above. In some cases, a subjectCasZ trancRNA comprises a nucleotide sequence having 80% or moreidentity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97%or more, 98% or more, or 100% identity) with a CasZb trancRNA sequenceabove, and has a length of from 85-240 nt (e.g., 85-230, 85-220, 85-150,85-130, 95-240, 95-230, 95-220, 95-150, or 95-130 nt).

In some cases, a subject CasZ trancRNA comprises a CasZc trancRNAsequence above. In some cases, a subject CasZ trancRNA comprises anucleotide sequence having 70% or more identity (e.g., 75% or more, 80%or more, 85% or more, 90% or more, 93% or more, 95% or more, 97% ormore, 98% or more, or 100% identity) with a CasZc trancRNA sequenceabove. In some cases, a subject CasZ trancRNA comprises a nucleotidesequence having 80% or more identity (e.g., 85% or more, 90% or more,93% or more, 95% or more, 97% or more, 98% or more, or 100% identity)with a CasZc trancRNA sequence above. In some cases, a subject CasZtrancRNA comprises a nucleotide sequence having 90% or more identity(e.g., 93% or more, 95% or more, 97% or more, 98% or more, or 100%identity) with a CasZc trancRNA sequence above. In some cases, a subjectCasZ trancRNA comprises a nucleotide sequence having 80% or moreidentity (e.g., 85% or more, 90% or more, 93% or more, 95% or more, 97%or more, 98% or more, or 100% identity) with a CasZc trancRNA sequenceabove, and has a length of from 80-110 nt (e.g., 85-105 nt) or from235-270 nt (e.g., 240-260 nt).

In some cases, a subject CasZ trancRNA comprises a CasZa, CasZb, orCasZc trancRNA sequence above. In some cases, a subject CasZ trancRNAcomprises a nucleotide sequence having 70% or more identity (e.g., 75%or more, 80% or more, 85% or more, 90% or more, 93% or more, 95% ormore, 97% or more, 98% or more, or 100% identity) with a CasZa, CasZb,or CasZc trancRNA sequence above. In some cases, a subject CasZ trancRNAcomprises a nucleotide sequence having 80% or more identity (e.g., 85%or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% ormore, or 100% identity) with a CasZa, CasZb, or CasZc trancRNA sequenceabove. In some cases, a subject CasZ trancRNA comprises a nucleotidesequence having 90% or more identity (e.g., 93% or more, 95% or more,97% or more, 98% or more, or 100% identity) with a CasZa, CasZb, orCasZc trancRNA sequence above. In some cases, a subject CasZ trancRNAcomprises a nucleotide sequence having 80% or more identity (e.g., 85%or more, 90% or more, 93% or more, 95% or more, 97% or more, 98% ormore, or 100% identity) with a CasZa, CasZb, or CasZc trancRNA sequenceabove, and has a length of from 60 nucleotides (nt) to 270 nt (e.g.,60-260, 70-270, 70-260, or 75-255 nt).

In some cases, a CasZ trancRNA comprises a modified nucleotide (e.g.,methylated). In some cases, a CasZ trancRNA comprises one or more of: i)a base modification or substitution; ii) a backbone modification; iii) amodified internucleoside linkage; and iv) a modified sugar moiety.Possible nucleic acid modifications are described below.

CasZ Systems

The present disclosure provides a CasZ system. A CasZ system of thepresent disclosure can comprise one or more of: (1) a CasZtransactivating noncoding RNA (trancRNA) (referred to herein as a “CasZtrancRNA”) or a nucleic acid encoding the CasZ trancRNA (e.g., anexpression vector); (2) a CasZ protein (e.g., a wild type protein, avariant, a catalytically compromised variant, a CasZ fusion protein, andthe like) or a nucleic acid encoding the CasZ protein (e.g., an RNA, anexpression vector, and the like); and (3) a CasZ guide RNA (that bindsto and provides sequence specificity to the CasZ protein, e.g., a guideRNA that can bind to a target sequence of a eukaryotic genome) or anucleic acid encoding the CasZ guide RNA)(e.g., an expression vector). ACasZ system can include a host cell (e.g., a eukaryotic cell, a plantcell, a mammalian cell, a human cell) that comprises one or more of (1),(2), and (3) (in any combination), e.g., in some cases the host cellcomprises a trancRNA and/or a nucleic acid encoding the trancRNA. Insome cases, a CasZ system includes (e.g., in addition to the above) adonor template nucleic acid. In some cases, the CasZ system is a systemof one or more nucleic acids (e.g., one or more expression vectorsencoding any combination of the above).

Nucleic Acids

The present disclosure provides one or more nucleic acids comprising oneor more of: a CasZ trancRNA sequence, a nucleotide sequence encoding aCasZ trancRNA, a nucleotide sequence encoding a CasZ polypeptide (e.g.,a wild type CasZ protein, a nickase CasZ protein, a dCasZ protein,chimeric CasZ protein/CasZ fusion protein, and the like), a CasZ guideRNA sequence, a nucleotide sequence encoding a CasZ guide RNA, and adonor polynucleotide (donor template, donor DNA) sequence. In somecases, a subject nucleic acid (e.g., the one or more nucleic acids) is arecombinant expression vector (e.g., plasmid, viral vector, minicircleDNA, and the like). In some cases, the nucleotide sequence encoding theCasZ trancRNA, the nucleotide sequence encoding the CasZ protein, and/orthe nucleotide sequence encoding the CasZ guide RNA is (are) operablylinked to a promoter (e.g., an inducible promoter), e.g., one that isoperable in a cell type of choice (e.g., a prokarytoic cell, aeukaryotic cell, a plant cell, an animal cell, a mammalian cell, aprimate cell, a rodent cell, a human cell, etc.).

In some cases, a nucleotide sequence encoding a CasZ polypeptide of thepresent disclosure is codon optimized. This type of optimization canentail a mutation of a CasZ-encoding nucleotide sequence to mimic thecodon preferences of the intended host organism or cell while encodingthe same protein. Thus, the codons can be changed, but the encodedprotein remains unchanged. For example, if the intended target cell wasa human cell, a human codon-optimized CasZ-encoding nucleotide sequencecould be used. As another non-limiting example, if the intended hostcell were a mouse cell, then a mouse codon-optimized CasZ-encodingnucleotide sequence could be generated. As another non-limiting example,if the intended host cell were a plant cell, then a plantcodon-optimized CasZ-encoding nucleotide sequence could be generated. Asanother non-limiting example, if the intended host cell were an insectcell, then an insect codon-optimized CasZ-encoding nucleotide sequencecould be generated.

The present disclosure provides one or more recombinant expressionvectors that include (in different recombinant expression vectors insome cases, and in the same recombinant expression vector in somecases): a CasZ trancRNA sequence, a nucleotide sequence encoding a CasZtrancRNA, a nucleotide sequence encoding a CasZ polypeptide (e.g., awild type CasZ protein, a nickase CasZ protein, a dCasZ protein,chimeric CasZ protein/CasZ fusion protein, and the like), a CasZ guideRNA sequence, a nucleotide sequence encoding a CasZ guide RNA, and adonor polynucleotide (donor template, donor DNA) sequence. In somecases, a subject nucleic acid (e.g., the one or more nucleic acids) is arecombinant expression vector (e.g., plasmid, viral vector, minicircleDNA, and the like). In some cases, the nucleotide sequence encoding theCasZ trancRNA, the nucleotide sequence encoding the CasZ protein, and/orthe nucleotide sequence encoding the CasZ guide RNA is (are) operablylinked to a promoter (e.g., an inducible promoter), e.g., one that isoperable in a cell type of choice (e.g., a prokarytoic cell, aeukaryotic cell, a plant cell, an animal cell, a mammalian cell, aprimate cell, a rodent cell, a human cell, etc.).

Suitable expression vectors include viral expression vectors (e.g. viralvectors based on vaccinia virus; poliovirus; adenovirus (see, e.g., Liet al., Invest Opthalmol Vis Sci 35:2543 2549, 1994; Borras et al., GeneTher 6:515 524, 1999; Li and Davidson, PNAS 92:7700 7704, 1995; Sakamotoet al., H Gene Ther 5:1088 1097, 1999; WO 94/12649, WO 93/03769; WO93/19191; WO 94/28938; WO 95/11984 and WO 95/00655); adeno-associatedvirus (AAV) (see, e.g., Ali et al., Hum Gene Ther 9:81 86, 1998,Flannery et al., PNAS 94:6916 6921, 1997; Bennett et al., InvestOpthalmol Vis Sci 38:2857 2863, 1997; Jomary et al., Gene Ther 4:683690, 1997, Rolling et al., Hum Gene Ther 10:641 648, 1999; Ali et al.,Hum Mol Genet 5:591 594, 1996; Srivastava in WO 93/09239, Samulski etal., J. Vir. (1989) 63:3822-3828; Mendelson et al., Virol. (1988)166:154-165; and Flotte et al., PNAS (1993) 90:10613-10617); SV40;herpes simplex virus; human immunodeficiency virus (see, e.g., Miyoshiet al., PNAS 94:10319 23, 1997; Takahashi et al., J Virol 73:7812 7816,1999); a retroviral vector (e.g., Murine Leukemia Virus, spleen necrosisvirus, and vectors derived from retroviruses such as Rous Sarcoma Virus,Harvey Sarcoma Virus, avian leukosis virus, a lentivirus, humanimmunodeficiency virus, myeloproliferative sarcoma virus, and mammarytumor virus); and the like. In some cases, a recombinant expressionvector of the present disclosure is a recombinant adeno-associated virus(AAV) vector. In some cases, a recombinant expression vector of thepresent disclosure is a recombinant lentivirus vector. In some cases, arecombinant expression vector of the present disclosure is a recombinantretroviral vector.

Depending on the host/vector system utilized, any of a number ofsuitable transcription and translation control elements, includingconstitutive and inducible promoters, transcription enhancer elements,transcription terminators, etc. may be used in the expression vector.

In some embodiments, a nucleotide sequence encoding a CasZ guide RNA isoperably linked to a control element, e.g., a transcriptional controlelement, such as a promoter. In some embodiments, a nucleotide sequenceencoding a CasZ protein or a CasZ fusion polypeptide is operably linkedto a control element, e.g., a transcriptional control element, such as apromoter.

The transcriptional control element can be a promoter. In some cases,the promoter is a constitutively active promoter. In some cases, thepromoter is a regulatable promoter. In some cases, the promoter is aninducible promoter. In some cases, the promoter is a tissue-specificpromoter. In some cases, the promoter is a cell type-specific promoter.In some cases, the transcriptional control element (e.g., the promoter)is functional in a targeted cell type or targeted cell population. Forexample, in some cases, the transcriptional control element can befunctional in eukaryotic cells, e.g., hematopoietic stem cells (e.g.,mobilized peripheral blood (mPB) CD34(+) cell, bone marrow (BM) CD34(+)cell, etc.).

Non-limiting examples of eukaryotic promoters (promoters functional in aeukaryotic cell) include EF1α, those from cytomegalovirus (CMV)immediate early, herpes simplex virus (HSV) thymidine kinase, early andlate SV40, long terminal repeats (LTRs) from retrovirus, and mousemetallothionein-I. Selection of the appropriate vector and promoter iswell within the level of ordinary skill in the art. The expressionvector may also contain a ribosome binding site for translationinitiation and a transcription terminator. The expression vector mayalso include appropriate sequences for amplifying expression. Theexpression vector may also include nucleotide sequences encoding proteintags (e.g., 6×His tag, hemagglutinin tag, fluorescent protein, etc.)that can be fused to the CasZ protein, thus resulting in a chimeric CasZpolypeptide.

In some cases, a nucleotide sequence encoding a CasZ guide RNA and/or aCasZ fusion polypeptide is operably linked to an inducible promoter. Insome cases, a nucleotide sequence encoding a CasZ guide RNA and/or aCasZ fusion protein is operably linked to a constitutive promoter.

A promoter can be a constitutively active promoter (i.e., a promoterthat is constitutively in an active/“ON” state), it may be an induciblepromoter (i.e., a promoter whose state, active/“ON” or inactive/“OFF”,is controlled by an external stimulus, e.g., the presence of aparticular temperature, compound, or protein.), it may be a spatiallyrestricted promoter (i.e., transcriptional control element, enhancer,etc.) (e.g., tissue specific promoter, cell type specific promoter,etc.), and it may be a temporally restricted promoter (i.e., thepromoter is in the “ON” state or “OFF” state during specific stages ofembryonic development or during specific stages of a biological process,e.g., hair follicle cycle in mice).

Suitable promoters can be derived from viruses and can therefore bereferred to as viral promoters, or they can be derived from anyorganism, including prokaryotic or eukaryotic organisms. Suitablepromoters can be used to drive expression by any RNA polymerase (e.g.,pol I, pol II, pol III). Exemplary promoters include, but are notlimited to the SV40 early promoter, mouse mammary tumor virus longterminal repeat (LTR) promoter; adenovirus major late promoter (Ad MLP);a herpes simplex virus (HSV) promoter, a cytomegalovirus (CMV) promotersuch as the CMV immediate early promoter region (CMVIE), a rous sarcomavirus (RSV) promoter, a human U6 small nuclear promoter (U6) (Miyagishiet al., Nature Biotechnology 20, 497-500 (2002)), an enhanced U6promoter (e.g., Xia et al., Nucleic Acids Res. 2003 Sep. 1; 31(17)), ahuman H1 promoter (H1), and the like.

In some cases, a nucleotide sequence encoding a CasZ guide RNA isoperably linked to (under the control of) a promoter operable in aeukaryotic cell (e.g., a U6 promoter, an enhanced U6 promoter, an H1promoter, and the like). As would be understood by one of ordinary skillin the art, when expressing an RNA (e.g., a guide RNA) from a nucleicacid (e.g., an expression vector) using a U6 promoter (e.g., in aeukaryotic cell), or another PolIII promoter, the RNA may need to bemutated if there are several Ts in a row (coding for Us in the RNA).This is because a string of Ts (e.g., 5 Ts) in DNA can act as aterminator for polymerase III (PolIII). Thus, in order to ensuretranscription of a guide RNA in a eukaryotic cell it may sometimes benecessary to modify the sequence encoding the guide RNA to eliminateruns of Ts. In some cases, a nucleotide sequence encoding a CasZ protein(e.g., a wild type CasZ protein, a nickase CasZ protein, a dCasZprotein, a chimeric CasZ protein and the like) is operably linked to apromoter operable in a eukaryotic cell (e.g., a CMV promoter, an EF1αpromoter, an estrogen receptor-regulated promoter, and the like).

Examples of inducible promoters include, but are not limited toT7 RNApolymerase promoter, T3 RNA polymerase promoter,Isopropyl-beta-D-thiogalactopyranoside (IPTG)-regulated promoter,lactose induced promoter, heat shock promoter, Tetracycline-regulatedpromoter, Steroid-regulated promoter, Metal-regulated promoter, estrogenreceptor-regulated promoter, etc. Inducible promoters can therefore beregulated by molecules including, but not limited to, doxycycline;estrogen and/or an estrogen analog; IPTG; etc.

Inducible promoters suitable for use include any inducible promoterdescribed herein or known to one of ordinary skill in the art. Examplesof inducible promoters include, without limitation,chemically/biochemically-regulated and physically-regulated promoterssuch as alcohol-regulated promoters, tetracycline-regulated promoters(e.g., anhydrotetracycline (aTc)-responsive promoters and othertetracycline-responsive promoter systems, which include a tetracyclinerepressor protein (tetR), a tetracycline operator sequence (tetO) and atetracycline transactivator fusion protein (tTA)), steroid-regulatedpromoters (e.g., promoters based on the rat glucocorticoid receptor,human estrogen receptor, moth ecdysone receptors, and promoters from thesteroid/retinoid/thyroid receptor superfamily), metal-regulatedpromoters (e.g., promoters derived from metallothionein (proteins thatbind and sequester metal ions) genes from yeast, mouse and human),pathogenesis-regulated promoters (e.g., induced by salicylic acid,ethylene or benzothiadiazole (BTH)), temperature/heat-induciblepromoters (e.g., heat shock promoters), and light-regulated promoters(e.g., light responsive promoters from plant cells).

In some cases, the promoter is a spatially restricted promoter (i.e.,cell type specific promoter, tissue specific promoter, etc.) such thatin a multi-cellular organism, the promoter is active (i.e., “ON”) in asubset of specific cells. Spatially restricted promoters may also bereferred to as enhancers, transcriptional control elements, controlsequences, etc. Any convenient spatially restricted promoter may be usedas long as the promoter is functional in the targeted host cell (e.g.,eukaryotic cell; prokaryotic cell).

In some cases, the promoter is a reversible promoter. Suitablereversible promoters, including reversible inducible promoters are knownin the art. Such reversible promoters may be isolated and derived frommany organisms, e.g., eukaryotes and prokaryotes. Modification ofreversible promoters derived from a first organism for use in a secondorganism, e.g., a first prokaryote and a second a eukaryote, a firsteukaryote and a second a prokaryote, etc., is well known in the art.Such reversible promoters, and systems based on such reversiblepromoters but also comprising additional control proteins, include, butare not limited to, alcohol regulated promoters (e.g., alcoholdehydrogenase I (alcA) gene promoter, promoters responsive to alcoholtransactivator proteins (AlcR), etc.), tetracycline regulated promoters,(e.g., promoter systems including TetActivators, TetON, TetOFF, etc.),steroid regulated promoters (e.g., rat glucocorticoid receptor promotersystems, human estrogen receptor promoter systems, retinoid promotersystems, thyroid promoter systems, ecdysone promoter systems,mifepristone promoter systems, etc.), metal regulated promoters (e.g.,metallothionein promoter systems, etc.), pathogenesis-related regulatedpromoters (e.g., salicylic acid regulated promoters, ethylene regulatedpromoters, benzothiadiazole regulated promoters, etc.), temperatureregulated promoters (e.g., heat shock inducible promoters (e.g., HSP-70,HSP-90, soybean heat shock promoter, etc.), light regulated promoters,synthetic inducible promoters, and the like.

Methods of introducing a nucleic acid (e.g., DNA or RNA) (e.g., anucleic acid comprising a donor polynucleotide sequence, one or morenucleic acids encoding a CasZ protein and/or a CasZ guide RNA and/or aCasZ trancRNA, and the like) into a host cell are known in the art, andany convenient method can be used to introduce a nucleic acid (e.g., anexpression construct) into a cell. Suitable methods include e.g., viralinfection, transfection, lipofection, electroporation, calcium phosphateprecipitation, polyethyleneimine (PEI)-mediated transfection,DEAE-dextran mediated transfection, liposome-mediated transfection,particle gun technology, calcium phosphate precipitation, directmicroinjection, nanoparticle-mediated nucleic acid delivery, and thelike.

Introducing the recombinant expression vector into cells can occur inany culture media and under any culture conditions that promote thesurvival of the cells. Introducing the recombinant expression vectorinto a target cell can be carried out in vivo or ex vivo. Introducingthe recombinant expression vector into a target cell can be carried outin vitro.

In some cases, a CasZ protein can be provided as RNA. The RNA can beprovided by direct chemical synthesis or may be transcribed in vitrofrom a DNA (e.g., encoding the CasZ protein). Once synthesized, the RNAmay be introduced into a cell by any of the well-known techniques forintroducing nucleic acids into cells (e.g., microinjection,electroporation, transfection, etc.).

Nucleic acids may be provided to the cells using well-developedtransfection techniques; see, e.g. Angel and Yanik (2010) PLoS ONE 5(7):e11756, and the commercially available TransMessenger® reagents fromQiagen, Stemfect™ RNA Transfection Kit from Stemgent, and TransIT®-mRNATransfection Kit from Mirus Bio LLC. See also Beumer et al. (2008) PNAS105(50):19821-19826.

Vectors may be provided directly to a target host cell. In other words,the cells are contacted with vectors comprising the subject nucleicacids (e.g., recombinant expression vectors having the donor templatesequence and encoding the CasZ guide RNA; recombinant expression vectorsencoding the CasZ protein; etc.) such that the vectors are taken up bythe cells. Methods for contacting cells with nucleic acid vectors thatare plasmids, include electroporation, calcium chloride transfection,microinjection, and lipofection are well known in the art. For viralvector delivery, cells can be contacted with viral particles comprisingthe subject viral expression vectors.

Retroviruses, for example, lentiviruses, are suitable for use in methodsof the present disclosure. Commonly used retroviral vectors are“defective”, i.e. unable to produce viral proteins required forproductive infection. Rather, replication of the vector requires growthin a packaging cell line. To generate viral particles comprising nucleicacids of interest, the retroviral nucleic acids comprising the nucleicacid are packaged into viral capsids by a packaging cell line. Differentpackaging cell lines provide a different envelope protein (ecotropic,amphotropic or xenotropic) to be incorporated into the capsid, thisenvelope protein determining the specificity of the viral particle forthe cells (ecotropic for murine and rat; amphotropic for most mammaliancell types including human, dog and mouse; and xenotropic for mostmammalian cell types except murine cells). The appropriate packagingcell line may be used to ensure that the cells are targeted by thepackaged viral particles. Methods of introducing subject vectorexpression vectors into packaging cell lines and of collecting the viralparticles that are generated by the packaging lines are well known inthe art. Nucleic acids can also introduced by direct micro-injection(e.g., injection of RNA).

Vectors used for providing the nucleic acids encoding CasZ guide RNAand/or a CasZ polypeptide to a target host cell can include suitablepromoters for driving the expression, that is, transcriptionalactivation, of the nucleic acid of interest. In other words, in somecases, the nucleic acid of interest will be operably linked to apromoter. This may include ubiquitously acting promoters, for example,the CMV-D-actin promoter, or inducible promoters, such as promoters thatare active in particular cell populations or that respond to thepresence of drugs such as tetracycline. By transcriptional activation,it is intended that transcription will be increased above basal levelsin the target cell by 10 fold, by 100 fold, more usually by 1000 fold.In addition, vectors used for providing a nucleic acid encoding a CasZguide RNA and/or a CasZ protein to a cell may include nucleic acidsequences that encode for selectable markers in the target cells, so asto identify cells that have taken up the CasZ guide RNA and/or CasZprotein.

A nucleic acid comprising a nucleotide sequence encoding a CasZpolypeptide, or a CasZ fusion polypeptide, is in some cases an RNA.Thus, a CasZ fusion protein can be introduced into cells as RNA. Methodsof introducing RNA into cells are known in the art and may include, forexample, direct injection, transfection, or any other method used forthe introduction of DNA. A CasZ protein may instead be provided to cellsas a polypeptide. Such a polypeptide may optionally be fused to apolypeptide domain that increases solubility of the product. The domainmay be linked to the polypeptide through a defined protease cleavagesite, e.g. a TEV sequence, which is cleaved by TEV protease. The linkermay also include one or more flexible sequences, e.g. from 1 to 10glycine residues. In some embodiments, the cleavage of the fusionprotein is performed in a buffer that maintains solubility of theproduct, e.g. in the presence of from 0.5 to 2 M urea, in the presenceof polypeptides and/or polynucleotides that increase solubility, and thelike. Domains of interest include endosomolytic domains, e.g. influenzaHA domain; and other polypeptides that aid in production, e.g. IF2domain, GST domain, GRPE domain, and the like. The polypeptide may beformulated for improved stability. For example, the peptides may bePEGylated, where the polyethyleneoxy group provides for enhancedlifetime in the blood stream.

Additionally, or alternatively, a CasZ polypeptide of the presentdisclosure may be fused to a polypeptide permeant domain to promoteuptake by the cell. A number of permeant domains are known in the artand may be used in the non-integrating polypeptides of the presentdisclosure, including peptides, peptidomimetics, and non-peptidecarriers. For example, a permeant peptide may be derived from the thirdalpha helix of Drosophila melanogaster transcription factorAntennapaedia, referred to as penetratin, which comprises the amino acidsequence RQIKIWFQNRRMKWKK (SEQ ID NO: 134). As another example, thepermeant peptide comprises the HIV-1 tat basic region amino acidsequence, which may include, for example, amino acids 49-57 ofnaturally-occurring tat protein. Other permeant domains includepoly-arginine motifs, for example, the region of amino acids 34-56 ofHIV-1 rev protein, nona-arginine, octa-arginine, and the like. (See, forexample, Futaki et al. (2003) Curr Protein Pept Sci. 2003 April; 4(2):87-9 and 446; and Wender et al. (2000) Proc. Natl. Acad. Sci. U.S.A 2000Nov. 21; 97(24):13003-8; published U.S. Patent applications 20030220334;20030083256; 20030032593; and 20030022831, herein specificallyincorporated by reference for the teachings of translocation peptidesand peptoids). The nona-arginine (R9) sequence is one of the moreefficient PTDs that have been characterized (Wender et al. 2000; Uemuraet al. 2002). The site at which the fusion is made may be selected inorder to optimize the biological activity, secretion or bindingcharacteristics of the polypeptide. The optimal site will be determinedby routine experimentation.

A CasZ polypeptide of the present disclosure may be produced in vitro orby eukaryotic cells or by prokaryotic cells, and it may be furtherprocessed by unfolding, e.g. heat denaturation, dithiothreitolreduction, etc. and may be further refolded, using methods known in theart.

Modifications of interest that do not alter primary sequence includechemical derivatization of polypeptides, e.g., acylation, acetylation,carboxylation, amidation, etc. Also included are modifications ofglycosylation, e.g. those made by modifying the glycosylation patternsof a polypeptide during its synthesis and processing or in furtherprocessing steps; e.g. by exposing the polypeptide to enzymes whichaffect glycosylation, such as mammalian glycosylating or deglycosylatingenzymes. Also embraced are sequences that have phosphorylated amino acidresidues, e.g. phosphotyrosine, phosphoserine, or phosphothreonine.

Also suitable for inclusion in embodiments of the present disclosure arenucleic acids (e.g., encoding a CasZ guide RNA, encoding a CasZ fusionprotein, etc.) and proteins (e.g., a CasZ fusion protein derived from awild type protein or a variant protein) that have been modified usingordinary molecular biological techniques and synthetic chemistry so asto improve their resistance to proteolytic degradation, to change thetarget sequence specificity, to optimize solubility properties, to alterprotein activity (e.g., transcription modulatory activity, enzymaticactivity, etc.) or to render them more suitable. Analogs of suchpolypeptides include those containing residues other than naturallyoccurring L-amino acids, e.g. D-amino acids or non-naturally occurringsynthetic amino acids. D-amino acids may be substituted for some or allof the amino acid residues.

A CasZ polypeptide of the present disclosure may be prepared by in vitrosynthesis, using conventional methods as known in the art. Variouscommercial synthetic apparatuses are available, for example, automatedsynthesizers by Applied Biosystems, Inc., Beckman, etc. By usingsynthesizers, naturally occurring amino acids may be substituted withunnatural amino acids. The particular sequence and the manner ofpreparation will be determined by convenience, economics, purityrequired, and the like.

If desired, various groups may be introduced into the peptide duringsynthesis or during expression, which allow for linking to othermolecules or to a surface. Thus cysteines can be used to makethioethers, histidines for linking to a metal ion complex, carboxylgroups for forming amides or esters, amino groups for forming amides,and the like.

A CasZ polypeptide of the present disclosure may also be isolated andpurified in accordance with conventional methods of recombinantsynthesis. A lysate may be prepared of the expression host and thelysate purified using high performance liquid chromatography (HPLC),exclusion chromatography, gel electrophoresis, affinity chromatography,or other purification technique. For the most part, the compositionswhich are used will comprise 20% or more by weight of the desiredproduct, more usually 75% or more by weight, preferably 95% or more byweight, and for therapeutic purposes, usually 99.5% or more by weight,in relation to contaminants related to the method of preparation of theproduct and its purification. Usually, the percentages will be basedupon total protein. Thus, in some cases, a CasZ polypeptide, or a CasZfusion polypeptide, of the present disclosure is at least 80% pure, atleast 85% pure, at least 90% pure, at least 95% pure, at least 98% pure,or at least 99% pure (e.g., free of contaminants, non-CasZ proteins orother macromolecules, etc.).

To induce cleavage or any desired modification to a target nucleic acid(e.g., genomic DNA), or any desired modification to a polypeptideassociated with target nucleic acid, the CasZ guide RNA and/or the CasZpolypeptide and/or the CasZ trancRNA, and/or the donor templatesequence, whether they be introduced as nucleic acids or polypeptides,can be provided to the cells for about 30 minutes to about 24 hours,e.g., 1 hour, 1.5 hours, 2 hours, 2.5 hours, 3 hours, 3.5 hours 4 hours,5 hours, 6 hours, 7 hours, 8 hours, 12 hours, 16 hours, 18 hours, 20hours, or any other period from about 30 minutes to about 24 hours,which may be repeated with a frequency of about every day to about every4 days, e.g., every 1.5 days, every 2 days, every 3 days, or any otherfrequency from about every day to about every four days. The agent(s)may be provided to the subject cells one or more times, e.g. one time,twice, three times, or more than three times, and the cells allowed toincubate with the agent(s) for some amount of time following eachcontacting event e.g. 16-24 hours, after which time the media isreplaced with fresh media and the cells are cultured further.

In cases in which two or more different targeting complexes are providedto the cell (e.g., two different CasZ guide RNAs that are complementaryto different sequences within the same or different target nucleicacid), the complexes may be provided simultaneously (e.g. as twopolypeptides and/or nucleic acids), or delivered simultaneously.Alternatively, they may be provided consecutively, e.g. the targetingcomplex being provided first, followed by the second targeting complex,etc. or vice versa.

To improve the delivery of a DNA vector into a target cell, the DNA canbe protected from damage and its entry into the cell facilitated, forexample, by using lipoplexes and polyplexes. Thus, in some cases, anucleic acid of the present disclosure (e.g., a recombinant expressionvector of the present disclosure) can be covered with lipids in anorganized structure like a micelle or a liposome. When the organizedstructure is complexed with DNA it is called a lipoplex. There are threetypes of lipids, anionic (negatively-charged), neutral, or cationic(positively-charged). Lipoplexes that utilize cationic lipids haveproven utility for gene transfer. Cationic lipids, due to their positivecharge, naturally complex with the negatively charged DNA. Also as aresult of their charge, they interact with the cell membrane.Endocytosis of the lipoplex then occurs, and the DNA is released intothe cytoplasm. The cationic lipids also protect against degradation ofthe DNA by the cell.

Complexes of polymers with DNA are called polyplexes. Most polyplexesconsist of cationic polymers and their production is regulated by ionicinteractions. One large difference between the methods of action ofpolyplexes and lipoplexes is that polyplexes cannot release their DNAload into the cytoplasm, so to this end, co-transfection withendosome-lytic agents (to lyse the endosome that is made duringendocytosis) such as inactivated adenovirus must occur. However, this isnot always the case; polymers such as polyethylenimine have their ownmethod of endosome disruption as does chitosan and trimethylchitosan.

Dendrimers, a highly branched macromolecule with a spherical shape, maybe also be used to genetically modify stem cells. The surface of thedendrimer particle may be functionalized to alter its properties. Inparticular, it is possible to construct a cationic dendrimer (i.e., onewith a positive surface charge). When in the presence of geneticmaterial such as a DNA plasmid, charge complementarity leads to atemporary association of the nucleic acid with the cationic dendrimer.On reaching its destination, the dendrimer-nucleic acid complex can betaken up into a cell by endocytosis.

In some cases, a nucleic acid of the disclosure (e.g., an expressionvector) includes an insertion site for a guide sequence of interest. Forexample, a nucleic acid can include an insertion site for a guidesequence of interest, where the insertion site is immediately adjacentto a nucleotide sequence encoding the portion of a CasZ guide RNA thatdoes not change when the guide sequence is changed to hybrized to adesired target sequence (e.g., sequences that contribute to the CasZbinding aspect of the guide RNA, e.g, the sequences that contribute tothe dsRNA duplex(es) of the CasZ guide RNA—this portion of the guide RNAcan also be referred to as the ‘scaffold’ or ‘constant region’ of theguide RNA). Thus, in some cases, a subject nucleic acid (e.g., anexpression vector) includes a nucleotide sequence encoding a CasZ guideRNA, except that the portion encoding the guide sequence portion of theguide RNA is an insertion sequence (an insertion site). An insertionsite is any nucleotide sequence used for the insertion of a desiredsequence. “Insertion sites” for use with various technologies are knownto those of ordinary skill in the art and any convenient insertion sitecan be used. An insertion site can be for any method for manipulatingnucleic acid sequences. For example, in some cases the insertion site isa multiple cloning site (MCS) (e.g., a site including one or morerestriction enzyme recognition sequences), a site for ligationindependent cloning, a site for recombination-based cloning (e.g.,recombination based on att sites), a nucleotide sequence recognized by aCRISPR/Cas (e.g. Cas9) based technology, and the like.

An insertion site can be any desirable length, and can depend on thetype of insertion site (e.g., can depend on whether (and how many) thesite includes one or more restriction enzyme recognition sequences,whether the site includes a target site for a CRISPR/Cas protein, etc.).In some cases, an insertion site of a subject nucleic acid is 3 or morenucleotides (nt) in length (e.g., 5 or more, 8 or more, 10 or more, 15or more, 17 or more, 18 or more, 19 or more, 20 or more or 25 or more,or 30 or more nt in length). In some cases, the length of an insertionsite of a subject nucleic acid has a length in a range of from 2 to 50nucleotides (nt) (e.g., from 2 to 40 nt, from 2 to 30 nt, from 2 to 25nt, from 2 to 20 nt, from 5 to 50 nt, from 5 to 40 nt, from 5 to 30 nt,from 5 to 25 nt, from 5 to 20 nt, from 10 to 50 nt, from 10 to 40 nt,from 10 to 30 nt, from 10 to 25 nt, from 10 to 20 nt, from 17 to 50 nt,from 17 to 40 nt, from 17 to 30 nt, from 17 to 25 nt). In some cases,the length of an insertion site of a subject nucleic acid has a lengthin a range of from 5 to 40 nt.

Nucleic Acid Modifications

In some embodiments, a subject nucleic acid (e.g., a CasZ guide RNA ortrancRNA) has one or more modifications, e.g., a base modification, abackbone modification, etc., to provide the nucleic acid with a new orenhanced feature (e.g., improved stability). A nucleoside is abase-sugar combination. The base portion of the nucleoside is normally aheterocyclic base. The two most common classes of such heterocyclicbases are the purines and the pyrimidines. Nucleotides are nucleosidesthat further include a phosphate group covalently linked to the sugarportion of the nucleoside. For those nucleosides that include apentofuranosyl sugar, the phosphate group can be linked to the 2′, the3′, or the 5′ hydroxyl moiety of the sugar. In forming oligonucleotides,the phosphate groups covalently link adjacent nucleosides to one anotherto form a linear polymeric compound. In turn, the respective ends ofthis linear polymeric compound can be further joined to form a circularcompound, however, linear compounds are suitable. In addition, linearcompounds may have internal nucleotide base complementarity and maytherefore fold in a manner as to produce a fully or partiallydouble-stranded compound. Within oligonucleotides, the phosphate groupsare commonly referred to as forming the internucleoside backbone of theoligonucleotide. The normal linkage or backbone of RNA and DNA is a 3′to 5′ phosphodiester linkage.

Suitable nucleic acid modifications include, but are not limited to:2′Omethyl modified nucleotides, 2′ Fluoro modified nucleotides, lockednucleic acid (LNA) modified nucleotides, peptide nucleic acid (PNA)modified nucleotides, nucleotides with phosphorothioate linkages, and a5′ cap (e.g., a 7-methylguanylate cap (m7G)). Additional details andadditional modifications are described below.

A 2′-O-Methyl modified nucleotide (also referred to as 2′-O-Methyl RNA)is a naturally occurring modification of RNA found in tRNA and othersmall RNAs that arises as a post-transcriptional modification.Oligonucleotides can be directly synthesized that contain 2′-O-MethylRNA. This modification increases Tm of RNA:RNA duplexes but results inonly small changes in RNA:DNA stability. It is stabile with respect toattack by single-stranded ribonucleases and is typically 5 to 10-foldless susceptible to DNases than DNA. It is commonly used in antisenseoligos as a means to increase stability and binding affinity to thetarget message.

2′ Fluoro modified nucleotides (e.g., 2′ Fluoro bases) have a fluorinemodified ribose which increases binding affinity (Tm) and also conferssome relative nuclease resistance when compared to native RNA. Thesemodifications are commonly employed in ribozymes and siRNAs to improvestability in serum or other biological fluids.

LNA bases have a modification to the ribose backbone that locks the basein the C3′-endo position, which favors RNA A-type helix duplex geometry.This modification significantly increases Tm and is also very nucleaseresistant. Multiple LNA insertions can be placed in an oligo at anyposition except the 3′-end. Applications have been described rangingfrom antisense oligos to hybridization probes to SNP detection andallele specific PCR. Due to the large increase in Tm conferred by LNAs,they also can cause an increase in primer dimer formation as well asself-hairpin formation. In some cases, the number of LNAs incorporatedinto a single oligo is 10 bases or less.

The phosphorothioate (PS) bond (i.e., a phosphorothioate linkage)substitutes a sulfur atom for a non-bridging oxygen in the phosphatebackbone of a nucleic acid (e.g., an oligo). This modification rendersthe internucleotide linkage resistant to nuclease degradation.Phosphorothioate bonds can be introduced between the last 3-5nucleotides at the 5′- or 3′-end of the oligo to inhibit exonucleasedegradation. Including phosphorothioate bonds within the oligo (e.g.,throughout the entire oligo) can help reduce attack by endonucleases aswell.

In some embodiments, a subject nucleic acid has one or more nucleotidesthat are 2′-O-Methyl modified nucleotides. In some embodiments, asubject nucleic acid (e.g., a guide RNA, a tranc RNA, etc.) has one ormore 2′ Fluoro modified nucleotides. In some embodiments, a subjectnucleic acid (e.g., a dsRNA, a siNA, etc.) has one or more LNA bases. Insome embodiments, a subject nucleic acid (e.g., a dsRNA, a siNA, etc.)has one or more nucleotides that are linked by a phosphorothioate bond(i.e., the subject nucleic acid has one or more phosphorothioatelinkages). In some embodiments, a subject nucleic acid (e.g., a dsRNA, asiNA, etc.) has a 5′ cap (e.g., a 7-methylguanylate cap (m7G)). In someembodiments, a subject nucleic acid (e.g., a guide RNA, a tranc RNA,etc.) has a combination of modified nucleotides. For example, a subjectnucleic acid (e.g., a guide RNA, a tranc RNA, etc.) can have a 5′ cap(e.g., a 7-methylguanylate cap (m7G)) in addition to having one or morenucleotides with other modifications (e.g., a 2′-O-Methyl nucleotideand/or a 2′ Fluoro modified nucleotide and/or a LNA base and/or aphosphorothioate linkage).

Modified Backbones and Modified Internucleoside Linkages

Examples of suitable nucleic acids (e.g., a CasZ guide RNA and/or CasZtrancRNA) containing modifications include nucleic acids containingmodified backbones or non-natural internucleoside linkages. Nucleicacids having modified backbones include those that retain a phosphorusatom in the backbone and those that do not have a phosphorus atom in thebackbone.

Suitable modified oligonucleotide backbones containing a phosphorus atomtherein include, for example, phosphorothioates, chiralphosphorothioates, phosphorodithioates, phosphotriesters,aminoalkylphosphotriesters, methyl and other alkyl phosphonatesincluding 3′-alkylene phosphonates, 5′-alkylene phosphonates and chiralphosphonates, phosphinates, phosphoramidates including 3′-aminophosphoramidate and aminoalkylphosphoramidates, phosphorodiamidates,thionophosphoramidates, thionoalkylphosphonates,thionoalkylphosphotriesters, selenophosphates and boranophosphateshaving normal 3′-5′ linkages, 2′-5′ linked analogs of these, and thosehaving inverted polarity wherein one or more internucleotide linkages isa 3′ to 3′, 5′ to 5′ or 2′ to 2′ linkage. Suitable oligonucleotideshaving inverted polarity comprise a single 3′ to 3′ linkage at the3′-most internucleotide linkage i.e. a single inverted nucleosideresidue which may be a basic (the nucleobase is missing or has ahydroxyl group in place thereof). Various salts (such as, for example,potassium or sodium), mixed salts and free acid forms are also included.

In some embodiments, a subject nucleic acid comprises one or morephosphorothioate and/or heteroatom internucleoside linkages, inparticular —CH₂—NH—O—CH₂—, —CH₂—N(CH₃)—O—CH₂— (known as a methylene(methylimino) or MMI backbone), —CH₂—O—N(CH₃)—CH₂—,—CH₂—N(CH₃)—N(CH₃)—CH₂— and —O—N(CH₃)—CH₂—CH₂— (wherein the nativephosphodiester internucleotide linkage is represented as—O—P(═O)(OH)—O—CH₂—). MMI type internucleoside linkages are disclosed inthe above referenced U.S. Pat. No. 5,489,677, the disclosure of which isincorporated herein by reference in its entirety. Suitable amideinternucleoside linkages are disclosed in U.S. Pat. No. 5,602,240, thedisclosure of which is incorporated herein by reference in its entirety.

Also suitable are nucleic acids having morpholino backbone structures asdescribed in, e.g., U.S. Pat. No. 5,034,506. For example, in someembodiments, a subject nucleic acid comprises a 6-membered morpholinoring in place of a ribose ring. In some of these embodiments, aphosphorodiamidate or other non-phosphodiester internucleoside linkagereplaces a phosphodiester linkage.

Suitable modified polynucleotide backbones that do not include aphosphorus atom therein have backbones that are formed by short chainalkyl or cycloalkyl internucleoside linkages, mixed heteroatom and alkylor cycloalkyl internucleoside linkages, or one or more short chainheteroatomic or heterocyclic internucleoside linkages. These includethose having morpholino linkages (formed in part from the sugar portionof a nucleoside); siloxane backbones; sulfide, sulfoxide and sulfonebackbones; formacetyl and thioformacetyl backbones; methylene formacetyland thioformacetyl backbones; riboacetyl backbones; alkene containingbackbones; sulfamate backbones; methyleneimino and methylenehydrazinobackbones; sulfonate and sulfonamide backbones; amide backbones; andothers having mixed N, O, S and CH₂ component parts.

Mimetics

A subject nucleic acid can be a nucleic acid mimetic. The term “mimetic”as it is applied to polynucleotides is intended to includepolynucleotides wherein only the furanose ring or both the furanose ringand the internucleotide linkage are replaced with non-furanose groups,replacement of only the furanose ring is also referred to in the art asbeing a sugar surrogate. The heterocyclic base moiety or a modifiedheterocyclic base moiety is maintained for hybridization with anappropriate target nucleic acid. One such nucleic acid, a polynucleotidemimetic that has been shown to have excellent hybridization properties,is referred to as a peptide nucleic acid (PNA). In PNA, thesugar-backbone of a polynucleotide is replaced with an amide containingbackbone, in particular an aminoethylglycine backbone. The nucleotidesare retained and are bound directly or indirectly to aza nitrogen atomsof the amide portion of the backbone.

One polynucleotide mimetic that has been reported to have excellenthybridization properties is a peptide nucleic acid (PNA). The backbonein PNA compounds is two or more linked aminoethylglycine units whichgives PNA an amide containing backbone. The heterocyclic base moietiesare bound directly or indirectly to aza nitrogen atoms of the amideportion of the backbone. Representative U.S. patents that describe thepreparation of PNA compounds include, but are not limited to: U.S. Pat.Nos. 5,539,082; 5,714,331; and 5,719,262, the disclosures of which areincorporated herein by reference in their entirety.

Another class of polynucleotide mimetic that has been studied is basedon linked morpholino units (morpholino nucleic acid) having heterocyclicbases attached to the morpholino ring. A number of linking groups havebeen reported that link the morpholino monomeric units in a morpholinonucleic acid. One class of linking groups has been selected to give anon-ionic oligomeric compound. The non-ionic morpholino-based oligomericcompounds are less likely to have undesired interactions with cellularproteins. Morpholino-based polynucleotides are non-ionic mimics ofoligonucleotides which are less likely to form undesired interactionswith cellular proteins (Dwaine A. Braasch and David R. Corey,Biochemistry, 2002, 41(14), 4503-4510). Morpholino-based polynucleotidesare disclosed in U.S. Pat. No. 5,034,506, the disclosure of which isincorporated herein by reference in its entirety. A variety of compoundswithin the morpholino class of polynucleotides have been prepared,having a variety of different linking groups joining the monomericsubunits.

A further class of polynucleotide mimetic is referred to as cyclohexenylnucleic acids (CeNA). The furanose ring normally present in a DNA/RNAmolecule is replaced with a cyclohexenyl ring. CeNA DMT protectedphosphoramidite monomers have been prepared and used for oligomericcompound synthesis following classical phosphoramidite chemistry. Fullymodified CeNA oligomeric compounds and oligonucleotides having specificpositions modified with CeNA have been prepared and studied (see Wang etal., J. Am. Chem. Soc., 2000, 122, 8595-8602, the disclosure of which isincorporated herein by reference in its entirety). In general, theincorporation of CeNA monomers into a DNA chain increases its stabilityof a DNA/RNA hybrid. CeNA oligoadenylates formed complexes with RNA andDNA complements with similar stability to the native complexes. Thestudy of incorporating CeNA structures into natural nucleic acidstructures was shown by NMR and circular dichroism to proceed with easyconformational adaptation.

A further modification includes Locked Nucleic Acids (LNAs) in which the2′-hydroxyl group is linked to the 4′ carbon atom of the sugar ringthereby forming a 2′-C,4′-C-oxymethylene linkage thereby forming abicyclic sugar moiety. The linkage can be a methylene (—CH₂—), groupbridging the 2′ oxygen atom and the 4′ carbon atom wherein n is 1 or 2(Singh et al., Chem. Commun., 1998, 4, 455-456, the disclosure of whichis incorporated herein by reference in its entirety). LNA and LNAanalogs display very high duplex thermal stabilities with complementaryDNA and RNA (Tm=+3 to +10° C.), stability towards 3′-exonucleolyticdegradation and good solubility properties. Potent and nontoxicantisense oligonucleotides containing LNAs have been described (e.g.,Wahlestedt et al., Proc. Natl. Acad. Sci. U.S.A., 2000, 97, 5633-5638,the disclosure of which is incorporated herein by reference in itsentirety).

The synthesis and preparation of the LNA monomers adenine, cytosine,guanine, 5-methyl-cytosine, thymine and uracil, along with theiroligomerization, and nucleic acid recognition properties have beendescribed (e.g., Koshkin et al., Tetrahedron, 1998, 54, 3607-3630, thedisclosure of which is incorporated herein by reference in itsentirety). LNAs and preparation thereof are also described in WO98/39352 and WO 99/14226, as well as U.S. applications 20120165514,20100216983, 20090041809, 20060117410, 20040014959, 20020094555, and20020086998, the disclosures of which are incorporated herein byreference in their entirety.

Modified Sugar Moieties

A subject nucleic acid can also include one or more substituted sugarmoieties. Suitable polynucleotides comprise a sugar substituent groupselected from: OH; F; O-, S-, or N-alkyl; O-, S-, or N-alkenyl; O-, S-or N-alkynyl; or O-alkyl-O-alkyl, wherein the alkyl, alkenyl and alkynylmay be substituted or unsubstituted C.sub.1 to C₁₀ alkyl or C₂ to C₁₀alkenyl and alkynyl. Particularly suitable are O((CH₂)_(n)O)_(m)CH₃,O(CH₂)_(n)OCH₃, O(CH₂)_(n)NH₂, O(CH₂)_(n)CH₃, O(CH₂)_(n)ONH₂, andO(CH₂)_(n)ON((CH₂)_(n)CH₃)₂, where n and m are from 1 to about 10. Othersuitable polynucleotides comprise a sugar substituent group selectedfrom: C₁ to C₁₀ lower alkyl, substituted lower alkyl, alkenyl, alkynyl,alkaryl, aralkyl, O-alkaryl or O-aralkyl, SH, SCH₃, OCN, Cl, Br, CN,CF₃, OCF₃, SOCH₃, SO₂CH₃, ONO₂, NO₂, N₃, NH₂, heterocycloalkyl,heterocycloalkaryl, aminoalkylamino, polyalkylamino, substituted silyl,an RNA cleaving group, a reporter group, an intercalator, a group forimproving the pharmacokinetic properties of an oligonucleotide, or agroup for improving the pharmacodynamic properties of anoligonucleotide, and other substituents having similar properties. Asuitable modification includes 2′-methoxyethoxy (2′-O—CH₂ CH₂OCH₃, alsoknown as 2′-O-(2-methoxyethyl) or 2′-MOE) (Martin et al., Helv. Chim.Acta, 1995, 78, 486-504, the disclosure of which is incorporated hereinby reference in its entirety) i.e., an alkoxyalkoxy group. A furthersuitable modification includes 2′-dimethylaminooxyethoxy, i.e., aO(CH₂)₂ON(CH₃)₂ group, also known as 2′-DMAOE, as described in exampleshereinbelow, and 2′-dimethylaminoethoxyethoxy (also known in the art as2′-O-dimethyl-amino-ethoxy-ethyl or 2′-DMAEOE), i.e.,2′-O—CH₂—O—CH₂—N(CH₃)₂.

Other suitable sugar substituent groups include methoxy (—O—CH₃),aminopropoxy (—OCH₂ CH₂ CH₂NH₂), allyl (—CH₂—CH═CH₂), —O-allyl(—O—CH₂—CH═CH₂) and fluoro (F). 2′-sugar substituent groups may be inthe arabino (up) position or ribo (down) position. A suitable 2′-arabinomodification is 2′-F. Similar modifications may also be made at otherpositions on the oligomeric compound, particularly the 3′ position ofthe sugar on the 3′ terminal nucleoside or in 2′-5′ linkedoligonucleotides and the 5′ position of 5′ terminal nucleotide.Oligomeric compounds may also have sugar mimetics such as cyclobutylmoieties in place of the pentofuranosyl sugar.

Base Modifications and Substitutions

A subject nucleic acid may also include nucleobase (often referred to inthe art simply as “base”) modifications or substitutions. As usedherein, “unmodified” or “natural” nucleobases include the purine basesadenine (A) and guanine (G), and the pyrimidine bases thymine (T),cytosine (C) and uracil (U). Modified nucleobases include othersynthetic and natural nucleobases such as 5-methylcytosine (5-me-C),5-hydroxymethyl cytosine, xanthine, hypoxanthine, 2-aminoadenine,6-methyl and other alkyl derivatives of adenine and guanine, 2-propyland other alkyl derivatives of adenine and guanine, 2-thiouracil,2-thiothymine and 2-thiocytosine, 5-halouracil and cytosine, 5-propynyl(—C═C-CH₃) uracil and cytosine and other alkynyl derivatives ofpyrimidine bases, 6-azo uracil, cytosine and thymine, 5-uracil(pseudouracil), 4-thiouracil, 8-halo, 8-amino, 8-thiol, 8-thioalkyl,8-hydroxyl and other 8-substituted adenines and guanines, 5-haloparticularly 5-bromo, 5-trifluoromethyl and other 5-substituted uracilsand cytosines, 7-methylguanine and 7-methyladenine, 2-F-adenine,2-amino-adenine, 8-azaguanine and 8-azaadenine, 7-deazaguanine and7-deazaadenine and 3-deazaguanine and 3-deazaadenine. Further modifiednucleobases include tricyclic pyrimidines such as phenoxazinecytidine(1H-pyrimido(5,4-b)(1,4)benzoxazin-2(3H)-one), phenothiazinecytidine (1H-pyrimido(5,4-b)(1,4)benzothiazin-2(3H)-one), G-clamps suchas a substituted phenoxazine cytidine (e.g.9-(2-aminoethoxy)-H-pyrimido(5,4-(b) (1,4)benzoxazin-2(3H)-one),carbazole cytidine (2H-pyrimido(4,5-b)indol-2-one), pyridoindolecytidine (H-pyrido(3′,2′:4,5)pyrrolo(2,3-d)pyrimidin-2-one).

Heterocyclic base moieties may also include those in which the purine orpyrimidine base is replaced with other heterocycles, for example7-deaza-adenine, 7-deazaguanosine, 2-aminopyridine and 2-pyridone.Further nucleobases include those disclosed in U.S. Pat. No. 3,687,808,those disclosed in The Concise Encyclopedia Of Polymer Science AndEngineering, pages 858-859, Kroschwitz, J. I., ed. John Wiley & Sons,1990, those disclosed by Englisch et al., Angewandte Chemie,International Edition, 1991, 30, 613, and those disclosed by Sanghvi, Y.S., Chapter 15, Antisense Research and Applications, pages 289-302,Crooke, S. T. and Lebleu, B., ed., CRC Press, 1993; the disclosures ofwhich are incorporated herein by reference in their entirety. Certain ofthese nucleobases are useful for increasing the binding affinity of anoligomeric compound. These include 5-substituted pyrimidines,6-azapyrimidines and N-2, N-6 and O-6 substituted purines, including2-aminopropyladenine, 5-propynyluracil and 5-propynylcytosine.5-methylcytosine substitutions have been shown to increase nucleic acidduplex stability by 0.6-1.2° C. (Sanghvi et al., eds., AntisenseResearch and Applications, CRC Press, Boca Raton, 1993, pp. 276-278; thedisclosure of which is incorporated herein by reference in its entirety)and are suitable base substitutions, e.g., when combined with2′-O-methoxyethyl sugar modifications.

Conjugates

Another possible modification of a subject nucleic acid involveschemically linking to the polynucleotide one or more moieties orconjugates which enhance the activity, cellular distribution or cellularuptake of the oligonucleotide. These moieties or conjugates can includeconjugate groups covalently bound to functional groups such as primaryor secondary hydroxyl groups. Conjugate groups include, but are notlimited to, intercalators, reporter molecules, polyamines, polyamides,polyethylene glycols, polyethers, groups that enhance thepharmacodynamic properties of oligomers, and groups that enhance thepharmacokinetic properties of oligomers. Suitable conjugate groupsinclude, but are not limited to, cholesterols, lipids, phospholipids,biotin, phenazine, folate, phenanthridine, anthraquinone, acridine,fluoresceins, rhodamines, coumarins, and dyes. Groups that enhance thepharmacodynamic properties include groups that improve uptake, enhanceresistance to degradation, and/or strengthen sequence-specifichybridization with the target nucleic acid. Groups that enhance thepharmacokinetic properties include groups that improve uptake,distribution, metabolism or excretion of a subject nucleic acid.

Conjugate moieties include but are not limited to lipid moieties such asa cholesterol moiety (Letsinger et al., Proc. Natl. Acad. Sci. USA,1989, 86, 6553-6556), cholic acid (Manoharan et al., Bioorg. Med. Chem.Let., 1994, 4, 1053-1060), a thioether, e.g., hexyl-S-tritylthiol(Manoharan et al., Ann. N.Y. Acad. Sci., 1992, 660, 306-309; Manoharanet al., Bioorg. Med. Chem. Let., 1993, 3, 2765-2770), a thiocholesterol(Oberhauser et al., Nucl. Acids Res., 1992, 20, 533-538), an aliphaticchain, e.g., dodecandiol or undecyl residues (Saison-Behmoaras et al.,EMBO J., 1991, 10, 1111-1118; Kabanov et al., FEBS Lett., 1990, 259,327-330; Svinarchuk et al., Biochimie, 1993, 75, 49-54), a phospholipid,e.g., di-hexadecyl-rac-glycerol or triethylammonium1,2-di-O-hexadecyl-rac-glycero-3-H-phosphonate (Manoharan et al.,Tetrahedron Lett., 1995, 36, 3651-3654; Shea et al., Nucl. Acids Res.,1990, 18, 3777-3783), a polyamine or a polyethylene glycol chain(Manoharan et al., Nucleosides & Nucleotides, 1995, 14, 969-973), oradamantane acetic acid (Manoharan et al., Tetrahedron Lett., 1995, 36,3651-3654), a palmityl moiety (Mishra et al., Biochim. Biophys. Acta,1995, 1264, 229-237), or an octadecylamine orhexylamino-carbonyl-oxycholesterol moiety (Crooke et al., J. Pharmacol.Exp. Ther., 1996, 277, 923-937).

A conjugate may include a “Protein Transduction Domain” or PTD (alsoknown as a CPP—cell penetrating peptide), which may refer to apolypeptide, polynucleotide, carbohydrate, or organic or inorganiccompound that facilitates traversing a lipid bilayer, micelle, cellmembrane, organelle membrane, or vesicle membrane. A PTD attached toanother molecule, which can range from a small polar molecule to a largemacromolecule and/or a nanoparticle, facilitates the molecule traversinga membrane, for example going from extracellular space to intracellularspace, or cytosol to within an organelle (e.g., the nucleus). In somecases, a PTD is covalently linked to the 3′ end of an exogenouspolynucleotide. In some cases, a PTD is covalently linked to the 5′ endof an exogenous polynucleotide. Exemplary PTDs include but are notlimited to a minimal undecapeptide protein transduction domain(corresponding to residues 47-57 of HIV-1 TAT comprising YGRKKRRQRRR;SEQ ID NO: 130); a polyarginine sequence comprising a number ofarginines sufficient to direct entry into a cell (e.g., 3, 4, 5, 6, 7,8, 9, 10, or 10-50 arginines); a VP22 domain (Zender et al. (2002)Cancer Gene Ther. 9(6):489-96); an Drosophila Antennapedia proteintransduction domain (Noguchi et al. (2003) Diabetes 52(7):1732-1737); atruncated human calcitonin peptide (Trehin et al. (2004) Pharm. Research21:1248-1256); polylysine (Wender et al. (2000) Proc. Natl. Acad. Sci.USA 97:13003-13008); RRQRRTSKLMKR SEQ ID NO: 131); TransportanGWTLNSAGYLLGKINLKALAALAKKIL SEQ ID NO: 132);KALAWEAKLAKALAKALAKHLAKALAKALKCEA SEQ ID NO: 133); and RQIKIWFQNRRMKWKKSEQ ID NO: 134). Exemplary PTDs include but are not limited to,YGRKKRRQRRR SEQ ID NO: 130), RKKRRQRRR SEQ ID NO: 135); an argininehomopolymer of from 3 arginine residues to 50 arginine residues;Exemplary PTD domain amino acid sequences include, but are not limitedto, any of the following: YGRKKRRQRRR SEQ ID NO: 130); RKKRRQRR SEQ IDNO: 136); YARAAARQARA SEQ ID NO: 137); THRLPRRRRRR SEQ ID NO: 138); andGGRRARRRRRR SEQ ID NO: 139). In some cases, the PTD is an activatableCPP (ACPP) (Aguilera et al. (2009) Integr Biol (Camb) June; 1(5-6):371-381). ACPPs comprise a polycationic CPP (e.g., Arg9 or “R9”)connected via a cleavable linker to a matching polyanion (e.g., Glu9 or“E9”), which reduces the net charge to nearly zero and thereby inhibitsadhesion and uptake into cells. Upon cleavage of the linker, thepolyanion is released, locally unmasking the polyarginine and itsinherent adhesiveness, thus “activating” the ACPP to traverse themembrane.

Introducing Components into a Target Cell

A CasZ guide RNA (or a nucleic acid comprising a nucleotide sequenceencoding same) and/or a CasZ polypeptide (or a nucleic acid comprising anucleotide sequence encoding same) and/or a CasZ trancRNA (or a nucleicacid that includes a nucleotide sequence encoding same) and/or a donorpolynucleotide (donor template) can be introduced into a host cell byany of a variety of well-known methods.

Any of a variety of compounds and methods can be used to deliver to atarget cell a CasZ system of the present disclosure. As a non-limitingexample, a CasZ system of the present disclosure can be combined with alipid. As another non-limiting example, a CasZ system of the presentdisclosure can be combined with a particle, or formulated into aparticle.

Methods of introducing a nucleic acid into a host cell are known in theart, and any convenient method can be used to introduce a subjectnucleic acid (e.g., an expression construct/vector) into a target cell(e.g., prokaryotic cell, eukaryotic cell, plant cell, animal cell,mammalian cell, human cell, and the like). Suitable methods include,e.g., viral infection, transfection, conjugation, protoplast fusion,lipofection, electroporation, calcium phosphate precipitation,polyethyleneimine (PEI)-mediated transfection, DEAE-dextran mediatedtransfection, liposome-mediated transfection, particle gun technology,calcium phosphate precipitation, direct micro injection,nanoparticle-mediated nucleic acid delivery (see, e.g., Panyam et., alAdv Drug Deliv Rev. 2012 Sep. 13. pii: S0169-409X(12)00283-9. doi:10.1016/j.addr.2012.09.023), and the like.

In some cases, a CasZ polypeptide of the present disclosure (e.g., wildtype protein, variant protein, chimeric/fusion protein, dCasZ, etc.) isprovided as a nucleic acid (e.g., an mRNA, a DNA, a plasmid, anexpression vector, a viral vector, etc.) that encodes the CasZpolypeptide. In some cases, the CasZ polypeptide of the presentdisclosure is provided directly as a protein (e.g., without anassociated guide RNA or with an associate guide RNA, i.e., as aribonucleoprotein complex). A CasZ polypeptide of the present disclosurecan be introduced into a cell (provided to the cell) by any convenientmethod; such methods are known to those of ordinary skill in the art. Asan illustrative example, a CasZ polypeptide of the present disclosurecan be injected directly into a cell (e.g., with or without a CasZ guideRNA or nucleic acid encoding a CasZ guide RNA, and with or without adonor polynucleotide and with or without a CasZ trancRNA). As anotherexample, a preformed complex of a CasZ polypeptide of the presentdisclosure and a CasZ guide RNA (an RNP) can be introduced into a cell(e.g, eukaryotic cell) (e.g., via injection, via nucleofection; via aprotein transduction domain (PTD) conjugated to one or more components,e.g., conjugated to the CasZ protein, conjugated to a guide RNA,conjugated to a CasZ trancRNA, conjugated to a CasZ polypeptide of thepresent disclosure and a guide RNA; etc.).

In some cases, a nucleic acid (e.g., a CasZ guide RNA and/or a nucleicacid encoding it, a nucleic acid encoding a CasZ protein, a CasZtrancRNA and/or a nucleic acid encoding it, and the like) and/or apolypeptide (e.g., a CasZ polypeptide; a CasZ fusion polypeptide) isdelivered to a cell (e.g., a target host cell) in a particle, orassociated with a particle. In some cases, a CasZ system of the presentdisclosure is delivered to a cell in a particle, or associated with aparticle. The terms “particle” and nanoparticle” can be usedinterchangeable, as appropriate. For example, a recombinant expressionvector comprising a nucleotide sequence encoding a CasZ polypeptide ofthe present disclosure and/or a CasZ guide RNA, an mRNA comprising anucleotide sequence encoding a CasZ polypeptide of the presentdisclosure, and guide RNA may be delivered simultaneously usingparticles or lipid envelopes; for instance, a CasZ polypeptide and/or aCasZ guide RNA and/or a trancRNA, e.g., as a complex (e.g., aribonucleoprotein (RNP) complex), can be delivered via a particle, e.g.,a delivery particle comprising lipid or lipidoid and hydrophilicpolymer, e.g., a cationic lipid and a hydrophilic polymer, for instancewherein the cationic lipid comprises1,2-dioleoyl-3-trimethylammonium-propane (DOTAP) or1,2-ditetradecanoyl-sn-glycero-3-phosphocholine (DMPC) and/or whereinthe hydrophilic polymer comprises ethylene glycol or polyethylene glycol(PEG); and/or wherein the particle further comprises cholesterol (e.g.,particle from formulation 1=DOTAP 100, DMPC 0, PEG 0, Cholesterol 0;formulation number 2=DOTAP 90, DMPC 0, PEG 10, Cholesterol 0;formulation number 3=DOTAP 90, DMPC 0, PEG 5, Cholesterol 5). Forexample, a particle can be formed using a multistep process in which aCasZ polypepide and a CasZ guideRNA are mixed together, e.g., at a 1:1molar ratio, e.g., at room temperature, e.g., for 30 minutes, e.g., insterile, nuclease free 1× phosphate-buffered saline (PBS); andseparately, DOTAP, DMPC, PEG, and cholesterol as applicable for theformulation are dissolved in alcohol, e.g., 100% ethanol; and, the twosolutions are mixed together to form particles containing thecomplexes).

A CasZ polypeptide of the present disclosure (or an mRNA comprising anucleotide sequence encoding a CasZ polypeptide of the presentdisclosure; or a recombinant expression vector comprising a nucleotidesequence encoding a CasZ polypeptide of the present disclosure) and/orCasZ guide RNA (or a nucleic acid such as one or more expression vectorsencoding the CasZ guide RNA) may be delivered simultaneously usingparticles or lipid envelopes. For example, a biodegradable core-shellstructured nanoparticle with a poly (β-amino ester) (PBAE) coreenveloped by a phospholipid bilayer shell can be used. In some cases,particles/nanoparticles based on self assembling bioadhesive polymersare used; such particles/nanoparticles may be applied to oral deliveryof peptides, intravenous delivery of peptides and nasal delivery ofpeptides, e.g., to the brain. Other embodiments, such as oral absorptionand ocular delivery of hydrophobic drugs are also contemplated. Amolecular envelope technology, which involves an engineered polymerenvelope which is protected and delivered to the site of the disease,can be used. Doses of about 5 mg/kg can be used, with single or multipledoses, depending on various factors, e.g., the target tissue.

Lipidoid compounds (e.g., as described in US patent application20110293703) are also useful in the administration of polynucleotides,and can be used to deliver a CasZ polypeptide of the present disclosure,a CasZ fusion polypeptide of the present disclosure, an RNP of thepresent disclosure, a nucleic acid of the present disclosure, or a CasZsystem of the present disclosure. In one aspect, the aminoalcohollipidoid compounds are combined with an agent to be delivered to a cellor a subject to form microparticles, nanoparticles, liposomes, ormicelles. The aminoalcohol lipidoid compounds may be combined with otheraminoalcohol lipidoid compounds, polymers (synthetic or natural),surfactants, cholesterol, carbohydrates, proteins, lipids, etc. to formthe particles. These particles may then optionally be combined with apharmaceutical excipient to form a pharmaceutical composition.

A poly(beta-amino alcohol) (PBAA) can be used to deliver a CasZpolypeptide of the present disclosure, a CasZ fusion polypeptide of thepresent disclosure, an RNP of the present disclosure, a nucleic acid ofthe present disclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA),or a CasZ system of the present disclosure, to a target cell. US PatentPublication No. 20130302401 relates to a class of poly(beta-aminoalcohols) (PBAAs) that has been prepared using combinatorialpolymerization.

Sugar-based particles may be used, for example GalNAc, as described withreference to WO2014118272 (incorporated herein by reference) and Nair, JK et al., 2014, Journal of the American Chemical Society 136 (49),16958-16961) can be used to deliver a CasZ polypeptide of the presentdisclosure, a CasZ fusion polypeptide of the present disclosure, an RNPof the present disclosure, a nucleic acid of the present disclosure, ora CasZ system of the present disclosure, to a target cell.

In some cases, lipid nanoparticles (LNPs) are used to deliver a CasZpolypeptide of the present disclosure, a CasZ fusion polypeptide of thepresent disclosure, an RNP of the present disclosure, a nucleic acid ofthe present disclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA),or a CasZ system of the present disclosure, to a target cell. Negativelycharged polymers such as RNA may be loaded into LNPs at low pH values(e.g., pH 4) where the ionizable lipids display a positive charge.However, at physiological pH values, the LNPs exhibit a low surfacecharge compatible with longer circulation times. Four species ofionizable cationic lipids have been focused upon, namely1,2-dilineoyl-3-dimethylammonium-propane (DLinDAP),1,2-dilinoleyloxy-3-N,N-dimethylaminopropane (DLinDMA),1,2-dilinoleyloxy-keto-N,N-dimethyl-3-aminopropane (DLinKDMA), and1,2-dilinoleyl-4-(2-dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA).Preparation of LNPs and is described in, e.g., Rosin et al. (2011)Molecular Therapy 19:1286-2200). The cationic lipids1,2-dilineoyl-3-dimethylammonium-propane (DLinDAP),1,2-dilinoleyloxy-3-N,N-dimethylaminopropane (DLinDMA),1,2-dilinoleyloxyketo-N,N-dimethyl-3-aminopropane (DLinK-DMA),1,2-dilinoleyl-4-(2-dimethylaminoethyl)-[1,3]-dioxolane (DLinKC2-DMA),(3-o-[2″-(methoxypolyethyleneglycol 2000)succinoyl]-1,2-dimyristoyl-sn-glycol (PEG-S-DMG), andR-3-[(.omega.-methoxy-poly(ethylene glycol)2000)carbamoyl]-1,2-dimyristyloxlpropyl-3-amine (PEG-C-DOMG) may be used. Anucleic acid (e.g., a CasZ guide RNA; a nucleic acid of the presentdisclosure; etc.) may be encapsulated in LNPs containing DLinDAP,DLinDMA, DLinK-DMA, and DLinKC2-DMA (cationic lipid:DSPC:CHOL: PEGS-DMGor PEG-C-DOMG at 40:10:40:10 molar ratios). In some cases, 0.2%SP-DiOC18 is incorporated.

Spherical Nucleic Acid (SNA™) constructs and other nanoparticles(particularly gold nanoparticles) can be used to deliver a CasZpolypeptide of the present disclosure, a CasZ fusion polypeptide of thepresent disclosure, an RNP of the present disclosure, a nucleic acid ofthe present disclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA),or a CasZ system of the present disclosure, to a target cell. See, e.g.,Cutler et al., J. Am. Chem. Soc. 2011 133:9254-9257, Hao et al., Small.2011 7:3158-3162, Zhang et al., ACS Nano. 2011 5:6962-6970, Cutler etal., J. Am. Chem. Soc. 2012 134:1376-1391, Young et al., Nano Lett. 201212:3867-71, Zheng et al., Proc. Natl. Acad. Sci. USA. 2012 109:11975-80,Mirkin, Nanomedicine 2012 7:635-638 Zhang et al., J. Am. Chem. Soc. 2012134:16488-1691, Weintraub, Nature 2013 495:S14-S16, Choi et al., Proc.Natl. Acad. Sci. USA. 2013 110(19): 7625-7630, Jensen et al., Sci.Transl. Med. 5, 209ra152 (2013) and Mirkin, et al., Small, 10:186-192.

Self-assembling nanoparticles with RNA may be constructed withpolyethyleneimine (PEI) that is PEGylated with an Arg-Gly-Asp (RGD)peptide ligand attached at the distal end of the polyethylene glycol(PEG).

In general, a “nanoparticle” refers to any particle having a diameter ofless than 1000 nm. In some cases, nanoparticles suitable for use indelivering a CasZ polypeptide of the present disclosure, a CasZ fusionpolypeptide of the present disclosure, an RNP of the present disclosure,a nucleic acid of the present disclosure (e.g., a CasZ guide RNA and/ora CasZ trancRNA), or a CasZ system of the present disclosure, to atarget cell have a diameter of 500 nm or less, e.g., from 25 nm to 35nm, from 35 nm to 50 nm, from 50 nm to 75 nm, from 75 nm to 100 nm, from100 nm to 150 nm, from 150 nm to 200 nm, from 200 nm to 300 nm, from 300nm to 400 nm, or from 400 nm to 500 nm. In some cases, nanoparticlessuitable for use in delivering a CasZ polypeptide of the presentdisclosure, a CasZ fusion polypeptide of the present disclosure, an RNPof the present disclosure, a nucleic acid of the present disclosure, ora CasZ system of the present disclosure, to a target cell have adiameter of from 25 nm to 200 nm. In some cases, nanoparticles suitablefor use in delivering a CasZ polypeptide of the present disclosure, aCasZ fusion polypeptide of the present disclosure, an RNP of the presentdisclosure, a nucleic acid of the present disclosure, or a CasZ systemof the present disclosure, to a target cell have a diameter of 100 nm orless In some cases, nanoparticles suitable for use in delivering a CasZpolypeptide of the present disclosure, a CasZ fusion polypeptide of thepresent disclosure, an RNP of the present disclosure, a nucleic acid ofthe present disclosure, or a CasZ system of the present disclosure, to atarget cell have a diameter of from 35 nm to 60 nm.

Nanoparticles suitable for use in delivering a CasZ polypeptide of thepresent disclosure, a CasZ fusion polypeptide of the present disclosure,an RNP of the present disclosure, a nucleic acid of the presentdisclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA), or a CasZsystem of the present disclosure, to a target cell may be provided indifferent forms, e.g., as solid nanoparticles (e.g., metal such assilver, gold, iron, titanium), non-metal, lipid-based solids, polymers),suspensions of nanoparticles, or combinations thereof. Metal,dielectric, and semiconductor nanoparticles may be prepared, as well ashybrid structures (e.g., core-shell nanoparticles). Nanoparticles madeof semiconducting material may also be labeled quantum dots if they aresmall enough (typically below 10 nm) that quantization of electronicenergy levels occurs. Such nanoscale particles are used in biomedicalapplications as drug carriers or imaging agents and may be adapted forsimilar purposes in the present disclosure.

Semi-solid and soft nanoparticles are also suitable for use indelivering a CasZ polypeptide of the present disclosure, a CasZ fusionpolypeptide of the present disclosure, an RNP of the present disclosure,a nucleic acid of the present disclosure (e.g., a CasZ guide RNA and/ora CasZ trancRNA), or a CasZ system of the present disclosure, to atarget cell. A prototype nanoparticle of semi-solid nature is theliposome.

In some cases, an exosome is used to deliver a CasZ polypeptide of thepresent disclosure, a CasZ fusion polypeptide of the present disclosure,an RNP of the present disclosure, a nucleic acid of the presentdisclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA), or a CasZsystem of the present disclosure, to a target cell. Exosomes areendogenous nano-vesicles that transport RNAs and proteins, and which candeliver RNA to the brain and other target organs.

In some cases, a liposome is used to deliver a CasZ polypeptide of thepresent disclosure, a CasZ fusion polypeptide of the present disclosure,an RNP of the present disclosure, a nucleic acid of the presentdisclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA), or a CasZsystem of the present disclosure, to a target cell. Liposomes arespherical vesicle structures composed of a uni- or multilamellar lipidbilayer surrounding internal aqueous compartments and a relativelyimpermeable outer lipophilic phospholipid bilayer. Liposomes can be madefrom several different types of lipids; however, phospholipids are mostcommonly used to generate liposomes. Although liposome formation isspontaneous when a lipid film is mixed with an aqueous solution, it canalso be expedited by applying force in the form of shaking by using ahomogenizer, sonicator, or an extrusion apparatus. Several otheradditives may be added to liposomes in order to modify their structureand properties. For instance, either cholesterol or sphingomyelin may beadded to the liposomal mixture in order to help stabilize the liposomalstructure and to prevent the leakage of the liposomal inner cargo. Aliposome formulation may be mainly comprised of natural phospholipidsand lipids such as 1,2-distearoryl-sn-glycero-3-phosphatidyl choline(DSPC), sphingomyelin, egg phosphatidylcholines andmonosialoganglioside.

A stable nucleic-acid-lipid particle (SNALP) can be used to deliver aCasZ polypeptide of the present disclosure, a CasZ fusion polypeptide ofthe present disclosure, an RNP of the present disclosure, a nucleic acidof the present disclosure (e.g., a CasZ guide RNA and/or a CasZtrancRNA), or a CasZ system of the present disclosure, to a target cell.The SNALP formulation may contain the lipids 3-N-[(methoxypoly(ethyleneglycol) 2000) carbamoyl]-1,2-dimyristyloxy-propylamine (PEG-C-DMA),1,2-dilinoleyloxy-N,N-dimethyl-3-aminopropane (DLinDMA),1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC) and cholesterol, in a2:40:10:48 molar percent ratio. The SNALP liposomes may be prepared byformulating D-Lin-DMA and PEG-C-DMA with distearoylphosphatidylcholine(DSPC), Cholesterol and siRNA using a 25:1 lipid/siRNA ratio and a48/40/10/2 molar ratio of Cholesterol/D-Lin-DMA/DSPC/PEG-C-DMA. Theresulting SNALP liposomes can be about 80-100 nm in size. A SNALP maycomprise synthetic cholesterol (Sigma-Aldrich, St Louis, Mo., USA),dipalmitoylphosphatidylcholine (Avanti Polar Lipids, Alabaster, Ala.,USA), 3-N-[(w-methoxy poly(ethyleneglycol)2000)carbamoyl]-1,2-dimyrestyloxypropylamine, and cationic1,2-dilinoleyloxy-3-N,Ndimethylaminopropane. A SNALP may comprisesynthetic cholesterol (Sigma-Aldrich),1,2-distearoyl-sn-glycero-3-phosphocholine (DSPC; Avanti Polar LipidsInc.), PEG-cDMA, and 1,2-dilinoleyloxy-3-(N;N-dimethyl)aminopropane(DLinDMA).

Other cationic lipids, such as amino lipid2,2-dilinoleyl-4-dimethylaminoethyl-[1,3]-dioxolane (DLin-KC2-DMA) canbe used to deliver a CasZ polypeptide of the present disclosure, a CasZfusion polypeptide of the present disclosure, an RNP of the presentdisclosure, a nucleic acid of the present disclosure (e.g., a CasZ guideRNA and/or a CasZ trancRNA), or a CasZ system of the present disclosure,to a target cell. A preformed vesicle with the following lipidcomposition may be contemplated: amino lipid,distearoylphosphatidylcholine (DSPC), cholesterol and(R)-2,3-bis(octadecyloxy) propyl-1-(methoxy poly(ethyleneglycol)2000)propylcarbamate (PEG-lipid) in the molar ratio 40/10/40/10,respectively, and a FVII siRNA/total lipid ratio of approximately 0.05(w/w). To ensure a narrow particle size distribution in the range of70-90 nm and a low polydispersity index of 0.11.+−0.0.04 (n=56), theparticles may be extruded up to three times through 80 nm membranesprior to adding the guide RNA. Particles containing the highly potentamino lipid 16 may be used, in which the molar ratio of the four lipidcomponents 16, DSPC, cholesterol and PEG-lipid (50/10/38.5/1.5) whichmay be further optimized to enhance in vivo activity.

Lipids may be formulated with a CasZ system of the present disclosure orcomponent(s) thereof or nucleic acids encoding the same to form lipidnanoparticles (LNPs). Suitable lipids include, but are not limited to,DLin-KC2-DMA4, C12-200 and colipids disteroylphosphatidyl choline,cholesterol, and PEG-DMG may be formulated with a CasZ system, orcomponent thereof, of the present disclosure, using a spontaneousvesicle formation procedure. The component molar ratio may be about50/10/38.5/1.5 (DLin-KC2-DMA or C12-200/disteroylphosphatidylcholine/cholesterol/PEG-DMG).

A CasZ system of the present disclosure, or a component thereof, may bedelivered encapsulated in PLGA microspheres such as that furtherdescribed in US published applications 20130252281 and 20130245107 and20130244279.

Supercharged proteins can be used to deliver a CasZ polypeptide of thepresent disclosure, a CasZ fusion polypeptide of the present disclosure,an RNP of the present disclosure, a nucleic acid of the presentdisclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA), or a CasZsystem of the present disclosure, to a target cell. Superchargedproteins are a class of engineered or naturally occurring proteins withunusually high positive or negative net theoretical charge. Bothsupernegatively and superpositively charged proteins exhibit the abilityto withstand thermally or chemically induced aggregation.Superpositively charged proteins are also able to penetrate mammaliancells. Associating cargo with these proteins, such as plasmid DNA, RNA,or other proteins, can enable the functional delivery of thesemacromolecules into mammalian cells both in vitro and in vivo.

Cell Penetrating Peptides (CPPs) can be used to deliver a CasZpolypeptide of the present disclosure, a CasZ fusion polypeptide of thepresent disclosure, an RNP of the present disclosure, a nucleic acid ofthe present disclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA),or a CasZ system of the present disclosure, to a target cell. CPPstypically have an amino acid composition that either contains a highrelative abundance of positively charged amino acids such as lysine orarginine or has sequences that contain an alternating pattern ofpolar/charged amino acids and non-polar, hydrophobic amino acids.

An implantable device can be used to deliver a CasZ polypeptide of thepresent disclosure, a CasZ fusion polypeptide of the present disclosure,an RNP of the present disclosure, a nucleic acid of the presentdisclosure (e.g., a CasZ guide RNA and/or a CasZ trancRNA) (e.g., a CasZguide RNA, a nucleic acid encoding a CasZ guide RNA, a nucleic acidencoding CasZ polypeptide, a donor template, and the like), or a CasZsystem of the present disclosure, to a target cell (e.g., a target cellin vivo, where the target cell is a target cell in circulation, a targetcell in a tissue, a target cell in an organ, etc.). An implantabledevice suitable for use in delivering a CasZ polypeptide of the presentdisclosure, a CasZ fusion polypeptide of the present disclosure, an RNPof the present disclosure, a nucleic acid of the present disclosure(e.g., a CasZ guide RNA and/or a CasZ trancRNA), or a CasZ system of thepresent disclosure, to a target cell (e.g., a target cell in vivo, wherethe target cell is a target cell in circulation, a target cell in atissue, a target cell in an organ, etc.) can include a container (e.g.,a reservoir, a matrix, etc.) that comprises the CasZ polypeptide, theCasZ fusion polypeptide, the RNP, or the CasZ system (or componentthereof, e.g., a nucleic acid of the present disclosure).

A suitable implantable device can comprise a polymeric substrate, suchas a matrix for example, that is used as the device body, and in somecases additional scaffolding materials, such as metals or additionalpolymers, and materials to enhance visibility and imaging. Animplantable delivery device can be advantageous in providing releaselocally and over a prolonged period, where the polypeptide and/ornucleic acid to be delivered is released directly to a target site,e.g., the extracellular matrix (ECM), the vasculature surrounding atumor, a diseased tissue, etc. Suitable implantable delivery devicesinclude devices suitable for use in delivering to a cavity such as theabdominal cavity and/or any other type of administration in which thedrug delivery system is not anchored or attached, comprising a biostableand/or degradable and/or bioabsorbable polymeric substrate, which mayfor example optionally be a matrix. In some cases, a suitableimplantable drug delivery device comprises degradable polymers, whereinthe main release mechanism is bulk erosion. In some cases, a suitableimplantable drug delivery device comprises non degradable, or slowlydegraded polymers, wherein the main release mechanism is diffusionrather than bulk erosion, so that the outer part functions as membrane,and its internal part functions as a drug reservoir, which practicallyis not affected by the surroundings for an extended period (for examplefrom about a week to about a few months). Combinations of differentpolymers with different release mechanisms may also optionally be used.The concentration gradient at the can be maintained effectively constantduring a significant period of the total releasing period, and thereforethe diffusion rate is effectively constant (termed “zero mode”diffusion). By the term “constant” it is meant a diffusion rate that ismaintained above the lower threshold of therapeutic effectiveness, butwhich may still optionally feature an initial burst and/or mayfluctuate, for example increasing and decreasing to a certain degree.The diffusion rate can be so maintained for a prolonged period, and itcan be considered constant to a certain level to optimize thetherapeutically effective period, for example the effective silencingperiod.

In some cases, the implantable delivery system is designed to shield thenucleotide based therapeutic agent from degradation, whether chemical innature or due to attack from enzymes and other factors in the body ofthe subject.

The site for implantation of the device, or target site, can be selectedfor maximum therapeutic efficacy. For example, a delivery device can beimplanted within or in the proximity of a tumor environment, or theblood supply associated with a tumor. The target location can be,e.g.: 1) the brain at degenerative sites like in Parkinson or Alzheimerdisease at the basal ganglia, white and gray matter; 2) the spine, as inthe case of amyotrophic lateral sclerosis (ALS); 3) uterine cervix; 4)active and chronic inflammatory joints; 5) dermis as in the case ofpsoriasis; 7) sympathetic and sensoric nervous sites for analgesiceffect; 7) a bone; 8) a site of acute or chronic infection; 9) Intravaginal; 10) Inner ear-auditory system, labyrinth of the inner ear,vestibular system; 11) Intra tracheal; 12) Intra-cardiac; coronary,epicardiac; 13) urinary tract or bladder; 14) biliary system; 15)parenchymal tissue including and not limited to the kidney, liver,spleen; 16) lymph nodes; 17) salivary glands; 18) dental gums; 19)Intra-articular (into joints); 20) Intra-ocular; 21) Brain tissue; 22)Brain ventricles; 23) Cavities, including abdominal cavity (for examplebut without limitation, for ovary cancer); 24) Intra esophageal; and 25)Intra rectal; and 26) into the vasculature.

The method of insertion, such as implantation, may optionally already beused for other types of tissue implantation and/or for insertions and/orfor sampling tissues, optionally without modifications, or alternativelyoptionally only with non-major modifications in such methods. Suchmethods optionally include but are not limited to brachytherapy methods,biopsy, endoscopy with and/or without ultrasound, such as stereotacticmethods into the brain tissue, laparoscopy, including implantation witha laparoscope into joints, abdominal organs, the bladder wall and bodycavities.

Modified Host Cells

The present disclosure provides a modified cell comprising a CasZpolypeptide of the present disclosure and/or a nucleic acid comprising anucleotide sequence encoding a CasZ polypeptide of the presentdisclosure. The present disclosure provides a modified cell comprising aCasZ polypeptide of the present disclosure, where the modified cell is acell that does not normally comprise a CasZ polypeptide of the presentdisclosure. The present disclosure provides a modified cell (e.g., agenetically modified cell) comprising nucleic acid comprising anucleotide sequence encoding a CasZ polypeptide of the presentdisclosure. The present disclosure provides a genetically modified cellthat is genetically modified with an mRNA comprising a nucleotidesequence encoding a CasZ polypeptide of the present disclosure. Thepresent disclosure provides a genetically modified cell that isgenetically modified with a recombinant expression vector comprising anucleotide sequence encoding a CasZ polypeptide of the presentdisclosure. The present disclosure provides a genetically modified cellthat is genetically modified with a recombinant expression vectorcomprising: a) a nucleotide sequence encoding a CasZ polypeptide of thepresent disclosure; and b) a nucleotide sequence encoding a CasZ guideRNA of the present disclosure. The present disclosure provides agenetically modified cell that is genetically modified with arecombinant expression vector comprising: a) a nucleotide sequenceencoding a CasZ polypeptide of the present disclosure; b) a nucleotidesequence encoding a CasZ guide RNA of the present disclosure; and c) anucleotide sequence encoding a donor template.

A cell that serves as a recipient for a CasZ polypeptide of the presentdisclosure and/or a nucleic acid comprising a nucleotide sequenceencoding a CasZ polypeptide of the present disclosure and/or a CasZguide RNA of the present disclosure (or a nucleic acid encoding it)and/or a CasZ trancRNA (or a nucleic acid encoding it), can be any of avariety of cells, including, e.g., in vitro cells; in vivo cells; exvivo cells; primary cells; cancer cells; animal cells; plant cells;algal cells; fungal cells; etc. A cell that serves as a recipient for aCasZ polypeptide of the present disclosure and/or a nucleic acidcomprising a nucleotide sequence encoding a CasZ polypeptide of thepresent disclosure and/or a CasZ guide RNA of the present disclosure isreferred to as a “host cell” or a “target cell.” A host cell or a targetcell can be a recipient of a CasZ system of the present disclosure. Ahost cell or a target cell can be a recipient of a CasZ RNP of thepresent disclosure. A host cell or a target cell can be a recipient of asingle component of a CasZ system of the present disclosure.

Non-limiting examples of cells (target cells) include: a prokaryoticcell, eukaryotic cell, a bacterial cell, an archaeal cell, a cell of asingle-cell eukaryotic organism, a protozoa cell, a cell from a plant(e.g., cells from plant crops, fruits, vegetables, grains, soy bean,corn, maize, wheat, seeds, tomatos, rice, cassava, sugarcane, pumpkin,hay, potatos, cotton, cannabis, tobacco, flowering plants, conifers,gymnosperms, angiosperms, ferns, clubmosses, hornworts, liverworts,mosses, dicotyledons, monocotyledons, etc.), an algal cell, (e.g.,Botryococcus braunii, Chlamydomonas reinhardtii, Nannochloropsisgaditana, Chlorella pyrenoidosa, Sargassum patens, C. agardh, and thelike), seaweeds (e.g. kelp) a fungal cell (e.g., a yeast cell, a cellfrom a mushroom), an animal cell, a cell from an invertebrate animal(e.g., fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from avertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cellfrom a mammal (e.g., an ungulate (e.g., a pig, a cow, a goat, a sheep);a rodent (e.g., a rat, a mouse); a non-human primate; a human; a feline(e.g., a cat); a canine (e.g., a dog); etc.), and the like. In somecases, the cell is a cell that does not originate from a naturalorganism (e.g., the cell can be a synthetically made cell; also referredto as an artificial cell).

A cell can be an in vitro cell (e.g., a cell in culture, e.g., anestablished cultured cell line). A cell can be an ex vivo cell (culturedcell from an individual). A cell can be and in vivo cell (e.g., a cellin an individual). A cell can be an isolated cell. A cell can be a cellinside of an organism. A cell can be an organism. A cell can be a cellin a cell culture (e.g., in vitro cell culture). A cell can be one of acollection of cells. A cell can be a prokaryotic cell or derived from aprokaryotic cell. A cell can be a bacterial cell or can be derived froma bacterial cell. A cell can be an archaeal cell or derived from anarchaeal cell. A cell can be a eukaryotic cell or derived from aeukaryotic cell. A cell can be a plant cell or derived from a plantcell. A cell can be an animal cell or derived from an animal cell. Acell can be an invertebrate cell or derived from an invertebrate cell. Acell can be a vertebrate cell or derived from a vertebrate cell. A cellcan be a mammalian cell or derived from a mammalian cell. A cell can bea rodent cell or derived from a rodent cell. A cell can be a human cellor derived from a human cell. A cell can be a microbe cell or derivedfrom a microbe cell. A cell can be a fungi cell or derived from a fungicell. A cell can be an insect cell. A cell can be an arthropod cell. Acell can be a protozoan cell. A cell can be a helminth cell.

Suitable cells include a stem cell (e.g. an embryonic stem (ES) cell, aninduced pluripotent stem (iPS) cell; a germ cell (e.g., an oocyte, asperm, an oogonia, a spermatogonia, etc.); a somatic cell, e.g. afibroblast, an oligodendrocyte, a glial cell, a hematopoietic cell, aneuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell,etc.

Suitable cells include human embryonic stem cells, fetal cardiomyocytes,myofibroblasts, mesenchymal stem cells, autotransplated expandedcardiomyocytes, adipocytes, totipotent cells, pluripotent cells, bloodstem cells, myoblasts, adult stem cells, bone marrow cells, mesenchymalcells, embryonic stem cells, parenchymal cells, epithelial cells,endothelial cells, mesothelial cells, fibroblasts, osteoblasts,chondrocytes, exogenous cells, endogenous cells, stem cells,hematopoietic stem cells, bone-marrow derived progenitor cells,myocardial cells, skeletal cells, fetal cells, undifferentiated cells,multi-potent progenitor cells, unipotent progenitor cells, monocytes,cardiac myoblasts, skeletal myoblasts, macrophages, capillaryendothelial cells, xenogenic cells, allogenic cells, and post-natal stemcells.

In some cases, the cell is an immune cell, a neuron, an epithelial cell,and endothelial cell, or a stem cell. In some cases, the immune cell isa T cell, a B cell, a monocyte, a natural killer cell, a dendritic cell,or a macrophage. In some cases, the immune cell is a cytotoxic T cell.In some cases, the immune cell is a helper T cell. In some cases, theimmune cell is a regulatory T cell (Treg).

In some cases, the cell is a stem cell. Stem cells include adult stemcells. Adult stem cells are also referred to as somatic stem cells.

Adult stem cells are resident in differentiated tissue, but retain theproperties of self-renewal and ability to give rise to multiple celltypes, usually cell types typical of the tissue in which the stem cellsare found. Numerous examples of somatic stem cells are known to those ofskill in the art, including muscle stem cells; hematopoietic stem cells;epithelial stem cells; neural stem cells; mesenchymal stem cells;mammary stem cells; intestinal stem cells; mesodermal stem cells;endothelial stem cells; olfactory stem cells; neural crest stem cells;and the like.

Stem cells of interest include mammalian stem cells, where the term“mammalian” refers to any animal classified as a mammal, includinghumans; non-human primates; domestic and farm animals; and zoo,laboratory, sports, or pet animals, such as dogs, horses, cats, cows,mice, rats, rabbits, etc. In some cases, the stem cell is a human stemcell. In some cases, the stem cell is a rodent (e.g., a mouse; a rat)stem cell. In some cases, the stem cell is a non-human primate stemcell.

Stem cells can express one or more stem cell markers, e.g., SOX9, KRT19,KRT7, LGR5, CA9, FXYD2, CDH6, CLDN18, TSPAN8, BPIFB 1, OLFM4, CDH17, andPPARGC1A.

In some cases, the stem cell is a hematopoietic stem cell (HSC). HSCsare mesoderm-derived cells that can be isolated from bone marrow, blood,cord blood, fetal liver and yolk sac. HSCs are characterized as CD34⁺and CD3⁻. HSCs can repopulate the erythroid, neutrophil-macrophage,megakaryocyte and lymphoid hematopoietic cell lineages in vivo. Invitro, HSCs can be induced to undergo at least some self-renewing celldivisions and can be induced to differentiate to the same lineages as isseen in vivo. As such, HSCs can be induced to differentiate into one ormore of erythroid cells, megakaryocytes, neutrophils, macrophages, andlymphoid cells.

In other cases, the stem cell is a neural stem cell (NSC). Neural stemcells (NSCs) are capable of differentiating into neurons, and glia(including oligodendrocytes, and astrocytes). A neural stem cell is amultipotent stem cell which is capable of multiple divisions, and underspecific conditions can produce daughter cells which are neural stemcells, or neural progenitor cells that can be neuroblasts or glioblasts,e.g., cells committed to become one or more types of neurons and glialcells respectively. Methods of obtaining NSCs are known in the art.

In other cases, the stem cell is a mesenchymal stem cell (MSC). MSCsoriginally derived from the embryonal mesoderm and isolated from adultbone marrow, can differentiate to form muscle, bone, cartilage, fat,marrow stroma, and tendon. Methods of isolating MSC are known in theart; and any known method can be used to obtain MSC. See, e.g., U.S.Pat. No. 5,736,396, which describes isolation of human MSC.

A cell is in some cases a plant cell. A plant cell can be a cell of amonocotyledon. A cell can be a cell of a dicotyledon.

In some cases, the cell is a plant cell. For example, the cell can be acell of a major agricultural plant, e.g., Barley, Beans (Dry Edible),Canola, Corn, Cotton (Pima), Cotton (Upland), Flaxseed, Hay (Alfalfa),Hay (Non-Alfalfa), Oats, Peanuts, Rice, Sorghum, Soybeans, Sugarbeets,Sugarcane, Sunflowers (Oil), Sunflowers (Non-Oil), Sweet Potatoes,Tobacco (Burley), Tobacco (Flue-cured), Tomatoes, Wheat (Durum), Wheat(Spring), Wheat (Winter), and the like. As another example, the cell isa cell of a vegetable crops which include but are not limited to, e.g.,alfalfa sprouts, aloe leaves, arrow root, arrowhead, artichokes,asparagus, bamboo shoots, banana flowers, bean sprouts, beans, beettops, beets, bittermelon, bok choy, broccoli, broccoli rabe (rappini),brussels sprouts, cabbage, cabbage sprouts, cactus leaf (nopales),calabaza, cardoon, carrots, cauliflower, celery, chayote, chineseartichoke (crosnes), chinese cabbage, chinese celery, chinese chives,choy sum, chrysanthemum leaves (tung ho), collard greens, corn stalks,corn-sweet, cucumbers, daikon, dandelion greens, dasheen, dau mue (peatips), donqua (winter melon), eggplant, endive, escarole, fiddle headferns, field cress, frisee, gai choy (chinese mustard), gailon, galanga(siam, thai ginger), garlic, ginger root, gobo, greens, hanover saladgreens, huauzontle, jerusalem artichokes, jicama, kale greens, kohlrabi,lamb's quarters (quilete), lettuce (bibb), lettuce (boston), lettuce(boston red), lettuce (green leaf), lettuce (iceberg), lettuce (lollarossa), lettuce (oak leaf—green), lettuce (oak leaf—red), lettuce(processed), lettuce (red leaf), lettuce (romaine), lettuce (rubyromaine), lettuce (russian red mustard), linkok, lo bok, long beans,lotus root, mache, maguey (agave) leaves, malanga, mesculin mix, mizuna,moap (smooth luffa), moo, moqua (fuzzy squash), mushrooms, mustard,nagaimo, okra, ong choy, onions green, opo (long squash), ornamentalcorn, ornamental gourds, parsley, parsnips, peas, peppers (bell type),peppers, pumpkins, radicchio, radish sprouts, radishes, rape greens,rape greens, rhubarb, romaine (baby red), rutabagas, salicornia (seabean), sinqua (angled/ridged luffa), spinach, squash, straw bales,sugarcane, sweet potatoes, swiss chard, tamarindo, taro, taro leaf, taroshoots, tatsoi, tepeguaje (guaje), tindora, tomatillos, tomatoes,tomatoes (cherry), tomatoes (grape type), tomatoes (plum type), tumeric,turnip tops greens, turnips, water chestnuts, yampi, yams (names), yuchoy, yuca (cassava), and the like.

A cell is in some cases an arthropod cell. For example, the cell can bea cell of a sub-order, a family, a sub-family, a group, a sub-group, ora species of, e.g., Chelicerata, Myriapodia, Hexipodia, Arachnida,Insecta, Archaeognatha, Thysanura, Palaeoptera, Ephemeroptera, Odonata,Anisoptera, Zygoptera, Neoptera, Exopterygota, Plecoptera, Embioptera,Orthoptera, Zoraptera, Dermaptera, Dictyoptera, Notoptera,Grylloblattidae, Mantophasmatidae, Phasmatodea, Blattaria, Isoptera,Mantodea, Parapneuroptera, Psocoptera, Thysanoptera, Phthiraptera,Hemiptera, Endopterygota or Holometabola, Hymenoptera, Coleoptera,Strepsiptera, Raphidioptera, Megaloptera, Neuroptera, Mecoptera,Siphonaptera, Diptera, Trichoptera, or Lepidoptera.

A cell is in some cases an insect cell. For example, in some cases, thecell is a cell of a mosquito, a grasshopper, a true bug, a fly, a flea,a bee, a wasp, an ant, a louse, a moth, or a beetle.

Kits

The present disclosure provides a kit comprising a CasZ system of thepresent disclosure, or a component of a CasZ system of the presentdisclosure.

A kit of the present disclosure can comprise any combination as listedfor a CasZ system (e.g., see above). A kit of the present disclosure cancomprise: a) a component, as described above, of a CasZ system of thepresent disclosure, or can comprise a CasZ system of the presentdisclosure; and b) one or more additional reagents, e.g., i) a buffer;ii) a protease inhibitor; iii) a nuclease inhibitor; iv) a reagentrequired to develop or visualize a detectable label; v) a positiveand/or negative control target DNA; vi) a positive and/or negativecontrol CasZ guide RNA; vii) a CasZ trancRNA; and the like. A kit of thepresent disclosure can comprise: a) a component, as described above, ofa CasZ system of the present disclosure, or can comprise a CasZ systemof the present disclosure; and b) a therapeutic agent.

A kit of the present disclosure can comprise a recombinant expressionvector comprising: a) an insertion site for inserting a nucleic acidcomprising a nucleotide sequence encoding a portion of a CasZ guide RNAthat hybridizes to a target nucleotide sequence in a target nucleicacid; and b) a nucleotide sequence encoding the CasZ-binding portion ofa CasZ guide RNA. A kit of the present disclosure can comprise arecombinant expression vector comprising: a) an insertion site forinserting a nucleic acid comprising a nucleotide sequence encoding aportion of a CasZ guide RNA that hybridizes to a target nucleotidesequence in a target nucleic acid; b) a nucleotide sequence encoding theCasZ-binding portion of a CasZ guide RNA; and c) a nucleotide sequenceencoding a CasZ polypeptide of the present disclosure. A kit of thepresent disclosure can comprise a recombinant expression vectorcomprising a nucleotide sequence encoding a CasZ trancRNA.

Detection of ssDNA

A CasZ (Cas14) polypeptide of the present disclosure, once activated bydetection of a target DNA (double or single stranded), can promiscuouslycleave non-targeted single stranded DNA (ssDNA). Once a CasZ (Cas14) isactivated by a guide RNA, which occurs when the guide RNA hybridizes toa target sequence of a target DNA (i.e., the sample includes the targetDNA, e.g., target ssDNA), the protein becomes a nuclease thatpromiscuously cleaves ssDNAs (i.e., the nuclease cleaves non-targetssDNAs, i.e., ssDNAs to which the guide sequence of the guide RNA doesnot hybridize). Thus, when the target DNA is present in the sample(e.g., in some cases above a threshold amount), the result is cleavageof ssDNAs in the sample, which can be detected using any convenientdetection method (e.g., using a labeled single stranded detector DNA).In some cases, a CasZ polypeptide requires, in addition to a CasZ guideRNA, a tranc RNA for activation.

Provided are compositions and methods for detecting a target DNA (doublestranded or single stranded) in a sample. In some cases, a detector DNAis used that is single stranded (ssDNA) and does not hybridize with theguide sequence of the guide RNA (i.e., the detector ssDNA is anon-target ssDNA). Such methods can include (a) contacting the samplewith: (i) a CasZ polypeptide; (ii) a guide RNA comprising: a region thatbinds to the CasZ polypeptide, and a guide sequence that hybridizes withthe target DNA; and (iii) a detector DNA that is single stranded anddoes not hybridize with the guide sequence of the guide RNA; and (b)measuring a detectable signal produced by cleavage of the singlestranded detector DNA by the CasZ polypeptide, thereby detecting thetarget DNA. In some cases, the methods include can include (a)contacting the sample with: (i) a CasZ polypeptide; (ii) a guide RNAcomprising: a region that binds to the CasZ polypeptide, and a guidesequence that hybridizes with the target DNA; (iii) a CasZ tranc RNA;and (iv) a detector DNA that is single stranded and does not hybridizewith the guide sequence of the guide RNA; and (b) measuring a detectablesignal produced by cleavage of the single stranded detector DNA by theCasZ polypeptide, thereby detecting the target DNA. As noted above, oncea subject CasZ polypeptide protein is activated by a guide RNA, whichoccurs when the sample includes a target DNA to which the guide RNAhybridizes (i.e., the sample includes the targeted target DNA), the CasZpolypeptide is activated and functions as an endoribonuclease thatnon-specifically cleaves ssDNAs (including non-target ssDNAs) present inthe sample. Thus, when the targeted target DNA is present in the sample(e.g., in some cases above a threshold amount), the result is cleavageof ssDNA (including non-target ssDNA) in the sample, which can bedetected using any convenient detection method (e.g., using a labeleddetector ssDNA).

Also provided are compositions and methods for cleaving single strandedDNAs (ssDNAs) (e.g., non-target ssDNAs). Such methods can includecontacting a population of nucleic acids, wherein said populationcomprises a target DNA and a plurality of non-target ssDNAs, with: (i) aCasZ polypeptide; and (ii) a guide RNA comprising: a region that bindsto the CasZ polypeptide, and a guide sequence that hybridizes with thetarget DNA, wherein the CasZ polypeptide cleaves non-target ssDNAs ofsaid plurality. Such methods can include contacting a population ofnucleic acids, wherein said population comprises a target DNA and aplurality of non-target ssDNAs, with: (i) a CasZ polypeptide; (ii) aguide RNA comprising: a region that binds to the CasZ polypeptide, and aguide sequence that hybridizes with the target DNA, and (iii) a CasZtranc RNA, wherein the CasZ polypeptide cleaves non-target ssDNAs ofsaid plurality. Such methods can be used, e.g., to cleave foreign ssDNAs(e.g., viral DNAs) in a cell.

The contacting step of a subject method can be carried out in acomposition comprising divalent metal ions. The contacting step can becarried out in an acellular environment, e.g., outside of a cell. Thecontacting step can be carried out inside a cell. The contacting stepcan be carried out in a cell in vitro. The contacting step can becarried out in a cell ex vivo. The contacting step can be carried out ina cell in vivo.

The guide RNA can be provided as RNA or as a nucleic acid encoding theguide RNA (e.g., a DNA such as a recombinant expression vector). Thetranc RNA can be provided as RNA or as a nucleic acid encoding the guideRNA (e.g., a DNA such as a recombinant expression vector). The CasZpolypeptide can be provided as a protein per se or as a nucleic acidencoding the protein (e.g., an mRNA, a DNA such as a recombinantexpression vector). In some cases, two or more (e.g., 3 or more, 4 ormore, 5 or more, or 6 or more) guide RNAs can be provided. In somecases, a single-molecule RNA comprising: i) a CasZ guide RNA; and ii) atranc RNA (or a nucleic acid comprising a nucleotide sequence encodingthe single-molecule RNA) is used.

In some cases (e.g., when contacting a sample with a guide RNA and aCasZ polypeptide; or when contacting a sample with a guide RNA, a CasZpolypeptide, and a tranc RNA), the sample is contacted for 2 hours orless (e.g., 1.5 hours or less, 1 hour or less, 40 minutes or less, 30minutes or less, 20 minutes or less, 10 minutes or less, or 5 minutes orless, or 1 minute or less) prior to the measuring step. For example, insome cases the sample is contacted for 40 minutes or less prior to themeasuring step. In some cases, the sample is contacted for 20 minutes orless prior to the measuring step. In some cases, the sample is contactedfor 10 minutes or less prior to the measuring step. In some cases, thesample is contacted for 5 minutes or less prior to the measuring step.In some cases, the sample is contacted for 1 minute or less prior to themeasuring step. In some cases, the sample is contacted for from 50seconds to 60 seconds prior to the measuring step. In some cases, thesample is contacted for from 40 seconds to 50 seconds prior to themeasuring step. In some cases, the sample is contacted for from 30seconds to 40 seconds prior to the measuring step. In some cases, thesample is contacted for from 20 seconds to 30 seconds prior to themeasuring step. In some cases, the sample is contacted for from 10seconds to 20 seconds prior to the measuring step.

In some cases, a method of the present disclosure for detecting a targetDNA comprises: a) contacting a sample with a guide RNA, a CasZpolypeptide, and a detector DNA), where the sample is contacted for 2hours or less (e.g., 1.5 hours or less, 1 hour or less, 40 minutes orless, 30 minutes or less, 20 minutes or less, 10 minutes or less, or 5minutes or less, or 1 minute or less), under conditions that provide fortrans cleavage of the detector DNA; b) maintaining the sample from step(a) for a period of time under conditions that do not provide for transcleavage of the detector RNA; and c) after the time period of step (b),measuring a detectable signal produced by cleavage of the singlestranded detector DNA by the CasZ polypeptide, thereby detecting thetarget DNA. Conditions that provide for trans cleavage of the detectorDNA include temperature conditions such as from 17° C. to about 39° C.(e.g., about 37° C.). Conditions that do not provide for trans cleavageof the detector DNA include temperatures of 10° C. or less, 5° C. orless, 4° C. or less, or 0° C.

In some cases, a method of the present disclosure for detecting a targetDNA comprises: a) contacting a sample with a guide RNA, a tranc RNA, aCasZ polypeptide, and a detector DNA (or contacting a sample with: i) asingle-molecule RNA comprising a guide RNA and a tranc RNA; i) a CasZpolypeptide; and iii) a detector DNA), where the sample is contacted for2 hours or less (e.g., 1.5 hours or less, 1 hour or less, 40 minutes orless, 30 minutes or less, 20 minutes or less, 10 minutes or less, or 5minutes or less, or 1 minute or less), under conditions that provide fortrans cleavage of the detector DNA; b) maintaining the sample from step(a) for a period of time under conditions that do not provide for transcleavage of the detector RNA; and c) after the time period of step (b),measuring a detectable signal produced by cleavage of the singlestranded detector DNA by the CasZ polypeptide, thereby detecting thetarget DNA. Conditions that provide for trans cleavage of the detectorDNA include temperature conditions such as from 17° C. to about 39° C.(e.g., about 37° C.). Conditions that do not provide for trans cleavageof the detector DNA include temperatures of 10° C. or less, 5° C. orless, 4° C. or less, or 0° C.

In some cases, a detectable signal produced by cleavage of asingle-stranded detector DNA is produced for no more than 60 minutes.For example, in some cases, a detectable signal produced by cleavage ofa single-stranded detector DNA is produced for no more than 60 minutes,no more than 45 minutes, no more than 30 minutes, no more than 15minutes, no more than 10 minutes, or no more than 5 minutes. Forexample, in some cases, a detectable signal produced by cleavage of asingle-stranded detector DNA is produced for a period of time of from 1minute to 60 minutes, e.g., from 1 minute to 5 minutes, from 5 minutesto 10 minutes, from 10 minutes to 15 minutes, from 15 minutes to 30minutes, from 30 minutes to 45 minutes, or from 45 minutes to 60minutes. In some cases, after the detectable signal is produced (e.g.,produced for no more than 60 minutes), production of the detectablesignal can be stopped, e.g., by lowering the temperature of the sample(e.g., lowering the temperature to 10° C. or less, 5° C. or less, 4° C.or less, or 0° C.), by adding an inhibitor to the sample, bylyophilizing the sample, by heating the sample to over 40° C., and thelike. The measuring step can occur at any time after production of thedetectable signal has been stopped. For example, the measuring step canoccur from 5 minutes to 48 hours after production of the detectablesignal has been stopped. For example, the measuring step can occur from5 minutes to 15 minutes, from 15 minutes to 30 minutes, from 30 minutesto 60 minutes, from 1 hour to 4 hours, from 4 hours to 8 hours, from 8hours to 12 hours, from 12 hours to 24 hours, from 24 hours to 36 hours,or from 36 hours to 48 hours, after production of the detectable signalhas been stopped. The measuring step can occur more than 48 hours afterproduction of the detectable signal has been stopped.

A method of the present disclosure for detecting a target DNA(single-stranded or double-stranded) in a sample can detect a target DNAwith a high degree of sensitivity. In some cases, a method of thepresent disclosure can be used to detect a target DNA present in asample comprising a plurality of DNAs (including the target DNA and aplurality of non-target DNAs), where the target DNA is present at one ormore copies per 10⁷ non-target DNAs (e.g., one or more copies per 10⁶non-target DNAs, one or more copies per 10⁵ non-target DNAs, one or morecopies per 10⁴ non-target DNAs, one or more copies per 10³ non-targetDNAs, one or more copies per 10² non-target DNAs, one or more copies per50 non-target DNAs, one or more copies per 20 non-target DNAs, one ormore copies per 10 non-target DNAs, or one or more copies per 5non-target DNAs). In some cases, a method of the present disclosure canbe used to detect a target DNA present in a sample comprising aplurality of DNAs (including the target DNA and a plurality ofnon-target DNAs), where the target DNA is present at one or more copiesper 10¹⁸ non-target DNAs (e.g., one or more copies per 10¹⁵ non-targetDNAs, one or more copies per 10¹² non-target DNAs, one or more copiesper 10⁹ non-target DNAs, one or more copies per 10⁶ non-target DNAs, oneor more copies per 10⁵ non-target DNAs, one or more copies per 10⁴non-target DNAs, one or more copies per 10³ non-target DNAs, one or morecopies per 10² non-target DNAs, one or more copies per 50 non-targetDNAs, one or more copies per 20 non-target DNAs, one or more copies per10 non-target DNAs, or one or more copies per 5 non-target DNAs).

In some cases, a method of the present disclosure can detect a targetDNA present in a sample, where the target DNA is present at from onecopy per 10⁷ non-target DNAs to one copy per 10 non-target DNAs (e.g.,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁶ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10 non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10 non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10³ non-target DNAs,or from 1 copy per 10⁵ non-target DNAs to 1 copy per 10⁴ non-targetDNAs).

In some cases, a method of the present disclosure can detect a targetDNA present in a sample, where the target DNA is present at from onecopy per 10¹⁸ non-target DNAs to one copy per 10 non-target DNAs (e.g.,from 1 copy per 10¹⁸ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10¹⁵ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10¹² non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁹ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁶ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10 non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10 non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10³ non-target DNAs,or from 1 copy per 10⁵ non-target DNAs to 1 copy per 10⁴ non-targetDNAs).

In some cases, a method of the present disclosure can detect a targetDNA present in a sample, where the target DNA is present at from onecopy per 10⁷ non-target DNAs to one copy per 100 non-target DNAs (e.g.,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁷ non-target DNAs to 1 copy per 10⁶ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 100 non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10³ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁴ non-target DNAs,from 1 copy per 10⁶ non-target DNAs to 1 copy per 10⁵ non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 100 non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10² non-target DNAs,from 1 copy per 10⁵ non-target DNAs to 1 copy per 10³ non-target DNAs,or from 1 copy per 10⁵ non-target DNAs to 1 copy per 10⁴ non-targetDNAs).

In some cases, the threshold of detection, for a subject method ofdetecting a target DNA in a sample, is 10 nM or less. Thus, e.g., thetarget DNA can be present in the sample in a concentration of 10 nM orless. The term “threshold of detection” is used herein to describe theminimal amount of target DNA that must be present in a sample in orderfor detection to occur. Thus, as an illustrative example, when athreshold of detection is 10 nM, then a signal can be detected when atarget DNA is present in the sample at a concentration of 10 nM or more.In some cases, a method of the present disclosure has a threshold ofdetection of 5 nM or less. In some cases, a method of the presentdisclosure has a threshold of detection of 1 nM or less. In some cases,a method of the present disclosure has a threshold of detection of 0.5nM or less. In some cases, a method of the present disclosure has athreshold of detection of 0.1 nM or less. In some cases, a method of thepresent disclosure has a threshold of detection of 0.05 nM or less. Insome cases, a method of the present disclosure has a threshold ofdetection of 0.01 nM or less. In some cases, a method of the presentdisclosure has a threshold of detection of 0.005 nM or less. In somecases, a method of the present disclosure has a threshold of detectionof 0.001 nM or less. In some cases, a method of the present disclosurehas a threshold of detection of 0.0005 nM or less. In some cases, amethod of the present disclosure has a threshold of detection of 0.0001nM or less. In some cases, a method of the present disclosure has athreshold of detection of 0.00005 nM or less. In some cases, a method ofthe present disclosure has a threshold of detection of 0.00001 nM orless. In some cases, a method of the present disclosure has a thresholdof detection of 10 pM or less. In some cases, a method of the presentdisclosure has a threshold of detection of 1 pM or less. In some cases,a method of the present disclosure has a threshold of detection of 500fM or less. In some cases, a method of the present disclosure has athreshold of detection of 250 fM or less. In some cases, a method of thepresent disclosure has a threshold of detection of 100 fM or less. Insome cases, a method of the present disclosure has a threshold ofdetection of 50 fM or less. In some cases, a method of the presentdisclosure has a threshold of detection of 500 aM (attomolar) or less.In some cases, a method of the present disclosure has a threshold ofdetection of 250 aM or less. In some cases, a method of the presentdisclosure has a threshold of detection of 100 aM or less. In somecases, a method of the present disclosure has a threshold of detectionof 50 aM or less. In some cases, a method of the present disclosure hasa threshold of detection of 10 aM or less. In some cases, a method ofthe present disclosure has a threshold of detection of 1 aM or less.

In some cases, the threshold of detection (for detecting the target DNAin a subject method), is in a range of from 500 fM to 1 nM (e.g., from500 fM to 500 pM, from 500 fM to 200 pM, from 500 fM to 100 pM, from 500fM to 10 pM, from 500 fM to 1 pM, from 800 fM to 1 nM, from 800 fM to500 pM, from 800 fM to 200 pM, from 800 fM to 100 pM, from 800 fM to 10pM, from 800 fM to 1 pM, from 1 pM to 1 nM, from 1 pM to 500 pM, from 1pM to 200 pM, from 1 pM to 100 pM, or from 1 pM to 10 pM) (where theconcentration refers to the threshold concentration of target DNA atwhich the target DNA can be detected). In some cases, a method of thepresent disclosure has a threshold of detection in a range of from 800fM to 100 pM. In some cases, a method of the present disclosure has athreshold of detection in a range of from 1 pM to 10 pM. In some cases,a method of the present disclosure has a threshold of detection in arange of from 10 fM to 500 fM, e.g., from 10 fM to 50 fM, from 50 fM to100 fM, from 100 fM to 250 fM, or from 250 fM to 500 fM.

In some cases, the minimum concentration at which a target DNA can bedetected in a sample is in a range of from 500 fM to 1 nM (e.g., from500 fM to 500 pM, from 500 fM to 200 pM, from 500 fM to 100 pM, from 500fM to 10 pM, from 500 fM to 1 pM, from 800 fM to 1 nM, from 800 fM to500 pM, from 800 fM to 200 pM, from 800 fM to 100 pM, from 800 fM to 10pM, from 800 fM to 1 pM, from 1 pM to 1 nM, from 1 pM to 500 pM, from 1pM to 200 pM, from 1 pM to 100 pM, or from 1 pM to 10 pM). In somecases, the minimum concentration at which a target DNA can be detectedin a sample is in a range of from 800 fM to 100 pM. In some cases, theminimum concentration at which a target DNA can be detected in a sampleis in a range of from 1 pM to 10 pM.

In some cases, the threshold of detection (for detecting the target DNAin a subject method), is in a range of from 1 aM to 1 nM (e.g., from 1aM to 500 pM, from 1 aM to 200 pM, from 1 aM to 100 pM, from 1 aM to 10pM, from 1 aM to 1 pM, from 100 aM to 1 nM, from 100 aM to 500 pM, from100 aM to 200 pM, from 100 aM to 100 pM, from 100 aM to 10 pM, from 100aM to 1 pM, from 250 aM to 1 nM, from 250 aM to 500 pM, from 250 aM to200 pM, from 250 aM to 100 pM, from 250 aM to 10 pM, from 250 aM to 1pM, from 500 aM to 1 nM, from 500 aM to 500 pM, from 500 aM to 200 pM,from 500 aM to 100 pM, from 500 aM to 10 pM, from 500 aM to 1 pM, from750 aM to 1 nM, from 750 aM to 500 pM, from 750 aM to 200 pM, from 750aM to 100 pM, from 750 aM to 10 pM, from 750 aM to 1 pM, from 1 fM to 1nM, from 1 fM to 500 pM, from 1 fM to 200 pM, from 1 fM to 100 pM, from1 fM to 10 pM, from 1 fM to 1 pM, from 500 fM to 500 pM, from 500 fM to200 pM, from 500 fM to 100 pM, from 500 fM to 10 pM, from 500 fM to 1pM, from 800 fM to 1 nM, from 800 fM to 500 pM, from 800 fM to 200 pM,from 800 fM to 100 pM, from 800 fM to 10 pM, from 800 fM to 1 pM, from 1pM to 1 nM, from 1 pM to 500 pM, from 1 pM to 200 pM, from 1 pM to 100pM, or from 1 pM to 10 pM) (where the concentration refers to thethreshold concentration of target DNA at which the target DNA can bedetected). In some cases, a method of the present disclosure has athreshold of detection in a range of from 1 aM to 800 aM. In some cases,a method of the present disclosure has a threshold of detection in arange of from 50 aM to 1 pM. In some cases, a method of the presentdisclosure has a threshold of detection in a range of from 50 aM to 500fM.

In some cases, a target DNA is present in a sample in a range of from 1aM to 1 nM (e.g., from 1 aM to 500 pM, from 1 aM to 200 pM, from 1 aM to100 pM, from 1 aM to 10 pM, from 1 aM to 1 pM, from 100 aM to 1 nM, from100 aM to 500 pM, from 100 aM to 200 pM, from 100 aM to 100 pM, from 100aM to 10 pM, from 100 aM to 1 pM, from 250 aM to 1 nM, from 250 aM to500 pM, from 250 aM to 200 pM, from 250 aM to 100 pM, from 250 aM to 10pM, from 250 aM to 1 pM, from 500 aM to 1 nM, from 500 aM to 500 pM,from 500 aM to 200 pM, from 500 aM to 100 pM, from 500 aM to 10 pM, from500 aM to 1 pM, from 750 aM to 1 nM, from 750 aM to 500 pM, from 750 aMto 200 pM, from 750 aM to 100 pM, from 750 aM to 10 pM, from 750 aM to 1pM, from 1 fM to 1 nM, from 1 fM to 500 pM, from 1 fM to 200 pM, from 1fM to 100 pM, from 1 fM to 10 pM, from 1 fM to 1 pM, from 500 fM to 500pM, from 500 fM to 200 pM, from 500 fM to 100 pM, from 500 fM to 10 pM,from 500 fM to 1 pM, from 800 fM to 1 nM, from 800 fM to 500 pM, from800 fM to 200 pM, from 800 fM to 100 pM, from 800 fM to 10 pM, from 800fM to 1 pM, from 1 pM to 1 nM, from 1 pM to 500 pM, from 1 pM to 200 pM,from 1 pM to 100 pM, or from 1 pM to 10 pM). In some cases, a target DNAis present in a sample in a range of from 1 aM to 800 aM. In some cases,a target DNA is present in a sample in a range of from 50 aM to 1 pM. Insome cases, a target DNA is present in a sample in a range of from 50 aMto 500 fM.

In some cases, the minimum concentration at which a target DNA can bedetected in a sample is in a range of from 1 aM to 1 nM (e.g., from 1 aMto 500 pM, from 1 aM to 200 pM, from 1 aM to 100 pM, from 1 aM to 10 pM,from 1 aM to 1 pM, from 100 aM to 1 nM, from 100 aM to 500 pM, from 100aM to 200 pM, from 100 aM to 100 pM, from 100 aM to 10 pM, from 100 aMto 1 pM, from 250 aM to 1 nM, from 250 aM to 500 pM, from 250 aM to 200pM, from 250 aM to 100 pM, from 250 aM to 10 pM, from 250 aM to 1 pM,from 500 aM to 1 nM, from 500 aM to 500 pM, from 500 aM to 200 pM, from500 aM to 100 pM, from 500 aM to 10 pM, from 500 aM to 1 pM, from 750 aMto 1 nM, from 750 aM to 500 pM, from 750 aM to 200 pM, from 750 aM to100 pM, from 750 aM to 10 pM, from 750 aM to 1 pM, from 1 fM to 1 nM,from 1 fM to 500 pM, from 1 fM to 200 pM, from 1 fM to 100 pM, from 1 fMto 10 pM, from 1 fM to 1 pM, from 500 fM to 500 pM, from 500 fM to 200pM, from 500 fM to 100 pM, from 500 fM to 10 pM, from 500 fM to 1 pM,from 800 fM to 1 nM, from 800 fM to 500 pM, from 800 fM to 200 pM, from800 fM to 100 pM, from 800 fM to 10 pM, from 800 fM to 1 pM, from 1 pMto 1 nM, from 1 pM to 500 pM, from 1 pM to 200 pM, from 1 pM to 100 pM,or from 1 pM to 10 pM). In some cases, the minimum concentration atwhich a target DNA can be detected in a sample is in a range of from 1aM to 500 pM. In some cases, the minimum concentration at which a targetDNA can be detected in a sample is in a range of from 100 aM to 500 pM.

In some cases, a target DNA is present in a sample in a range of from 1aM to 1 nM (e.g., from 1 aM to 500 pM, from 1 aM to 200 pM, from 1 aM to100 pM, from 1 aM to 10 pM, from 1 aM to 1 pM, from 100 aM to 1 nM, from100 aM to 500 pM, from 100 aM to 200 pM, from 100 aM to 100 pM, from 100aM to 10 pM, from 100 aM to 1 pM, from 250 aM to 1 nM, from 250 aM to500 pM, from 250 aM to 200 pM, from 250 aM to 100 pM, from 250 aM to 10pM, from 250 aM to 1 pM, from 500 aM to 1 nM, from 500 aM to 500 pM,from 500 aM to 200 pM, from 500 aM to 100 pM, from 500 aM to 10 pM, from500 aM to 1 pM, from 750 aM to 1 nM, from 750 aM to 500 pM, from 750 aMto 200 pM, from 750 aM to 100 pM, from 750 aM to 10 pM, from 750 aM to 1pM, from 1 fM to 1 nM, from 1 fM to 500 pM, from 1 fM to 200 pM, from 1fM to 100 pM, from 1 fM to 10 pM, from 1 fM to 1 pM, from 500 fM to 500pM, from 500 fM to 200 pM, from 500 fM to 100 pM, from 500 fM to 10 pM,from 500 fM to 1 pM, from 800 fM to 1 nM, from 800 fM to 500 pM, from800 fM to 200 pM, from 800 fM to 100 pM, from 800 fM to 10 pM, from 800fM to 1 pM, from 1 pM to 1 nM, from 1 pM to 500 pM, from 1 pM to 200 pM,from 1 pM to 100 pM, or from 1 pM to 10 pM). In some cases, a target DNAis present in a sample in a range of from 1 aM to 500 pM. In some cases,a target DNA is present in a sample in a range of from 100 aM to 500 pM.

In some cases, a subject composition or method exhibits an attomolar(aM) sensitivity of detection. In some cases, a subject composition ormethod exhibits a femtomolar (fM) sensitivity of detection. In somecases, a subject composition or method exhibits a picomolar (pM)sensitivity of detection. In some cases, a subject composition or methodexhibits a nanomolar (nM) sensitivity of detection.

Target DNA

A target DNA can be single stranded (ssDNA) or double stranded (dsDNA).When the target DNA is single stranded, there is no preference orrequirement for a PAM sequence in the target DNA. However, when thetarget DNA is dsDNA, a PAM is usually present adjacent to the targetsequence of the target DNA (e.g., see discussion of the PAM elsewhereherein). The source of the target DNA can be the same as the source ofthe sample, e.g., as described below.

The source of the target DNA can be any source. In some cases, thetarget DNA is a viral DNA (e.g., a genomic DNA of a DNA virus). As such,subject method can be for detecting the presence of a viral DNA amongsta population of nucleic acids (e.g., in a sample). A subject method canalso be used for the cleavage of non-target ssDNAs in the present of atarget DNA. For example, if a method takes place in a cell, a subjectmethod can be used to promiscuously cleave non-target ssDNAs in the cell(ssDNAs that do not hybridize with the guide sequence of the guide RNA)when a particular target DNA is present in the cell (e.g., when the cellis infected with a virus and viral target DNA is detected).

Examples of possible target DNAs include, but are not limited to, viralDNAs such as: a papovavirus (e.g., human papillomavirus (HPV),polyomavirus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); aherpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus(VZV), Epstein-Barr virus (EBV), cytomegalovirus (CMV), herpeslymphotropic virus, Pityriasis Rosea, kaposi's sarcoma-associatedherpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus,ichtadenovirus, mastadenovirus, siadenovirus); a poxvirus (e.g.,smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus,pseudocowpox, bovine papular stomatitis virus; tanapox virus, yabamonkey tumor virus; molluscum contagiosum virus (MCV)); a parvovirus(e.g., adeno-associated virus (AAV), Parvovirus B19, human bocavirus,bufavirus, human parv4 G1); Geminiviridae; Nanoviridae; Phycodnaviridae;and the like. In some cases, the target DNA is parasite DNA. In somecases, the target DNA is bacterial DNA, e.g., DNA of a pathogenicbacterium.

Samples

A subject sample includes nucleic acid (e.g., a plurality of nucleicacids). The term “plurality” is used herein to mean two or more. Thus,in some cases a sample includes two or more (e.g., 3 or more, 5 or more,10 or more, 20 or more, 50 or more, 100 or more, 500 or more, 1,000 ormore, or 5,000 or more) nucleic acids (e.g., DNAs). A subject method canbe used as a very sensitive way to detect a target DNA present in asample (e.g., in a complex mixture of nucleic acids such as DNAs). Insome cases, the sample includes 5 or more DNAs (e.g., 10 or more, 20 ormore, 50 or more, 100 or more, 500 or more, 1,000 or more, or 5,000 ormore DNAs) that differ from one another in sequence. In some cases, thesample includes 10 or more, 20 or more, 50 or more, 100 or more, 500 ormore, 10³ or more, 5×10³ or more, 10⁴ or more, 5×10⁴ or more, 10⁵ ormore, 5×10⁵ or more, 10⁶ or more 5×10⁶ or more, or 10⁷ or more, DNAs. Insome cases, the sample comprises from 10 to 20, from 20 to 50, from 50to 100, from 100 to 500, from 500 to 10³, from 10³ to 5×10³, from 5×10³to 10⁴, from 10⁴ to 5×10⁴, from 5×10⁴ to 10⁵, from 10⁵ to 5×10⁵, from5×10⁵ to 10⁶, from 10⁶ to 5×10⁶, or from 5×10⁶ to 10⁷, or more than 10⁷,DNAs. In some cases, the sample comprises from 5 to 10⁷ DNAs (e.g., thatdiffer from one another in sequence) (e.g., from 5 to 10⁶, from 5 to10⁵, from 5 to 50,000, from 5 to 30,000, from 10 to 10⁶, from 10 to 10⁵,from 10 to 50,000, from 10 to 30,000, from 20 to 10⁶, from 20 to 10⁵,from 20 to 50,000, or from 20 to 30,000 DNAs). In some cases, the sampleincludes 20 or more DNAs that differ from one another in sequence. Insome cases, the sample includes DNAs from a cell lysate (e.g., aeukaryotic cell lysate, a mammalian cell lysate, a human cell lysate, aprokaryotic cell lysate, a plant cell lysate, and the like). Forexample, in some cases the sample includes DNA from a cell such as aeukaryotic cell, e.g., a mammalian cell such as a human cell.

The term “sample” is used herein to mean any sample that includes DNA(e.g., in order to determine whether a target DNA is present among apopulation of DNAs). The sample can be derived from any source, e.g.,the sample can be a synthetic combination of purified DNAs; the samplecan be a cell lysate, an DNA-enriched cell lysate, or DNAs isolatedand/or purified from a cell lysate. The sample can be from a patient(e.g., for the purpose of diagnosis). The sample can be frompermeabilized cells. The sample can be from crosslinked cells. Thesample can be in tissue sections. The sample can be from tissuesprepared by crosslinking followed by delipidation and adjustment to makea uniform refractive index. Examples of tissue preparation bycrosslinking followed by delipidation and adjustment to make a uniformrefractive index have been described in, for example, Shah et al.,Development (2016) 143, 2862-2867 doi:10.1242/dev.138560.

A “sample” can include a target DNA and a plurality of non-target DNAs.In some cases, the target DNA is present in the sample at one copy per10 non-target DNAs, one copy per 20 non-target DNAs, one copy per 25non-target DNAs, one copy per 50 non-target DNAs, one copy per 100non-target DNAs, one copy per 500 non-target DNAs, one copy per 10³non-target DNAs, one copy per 5×10³ non-target DNAs, one copy per 10⁴non-target DNAs, one copy per 5×10⁴ non-target DNAs, one copy per 10⁵non-target DNAs, one copy per 5×10⁵ non-target DNAs, one copy per 10⁶non-target DNAs, or less than one copy per 10⁶ non-target DNAs. In somecases, the target DNA is present in the sample at from one copy per 10non-target DNAs to 1 copy per 20 non-target DNAs, from 1 copy per 20non-target DNAs to 1 copy per 50 non-target DNAs, from 1 copy per 50non-target DNAs to 1 copy per 100 non-target DNAs, from 1 copy per 100non-target DNAs to 1 copy per 500 non-target DNAs, from 1 copy per 500non-target DNAs to 1 copy per 10³ non-target DNAs, from 1 copy per 10³non-target DNAs to 1 copy per 5×10³ non-target DNAs, from 1 copy per5×10³ non-target DNAs to 1 copy per 10⁴ non-target DNAs, from 1 copy per10⁴ non-target DNAs to 1 copy per 10⁵ non-target DNAs, from 1 copy per10⁵ non-target DNAs to 1 copy per 10⁶ non-target DNAs, or from 1 copyper 10⁶ non-target DNAs to 1 copy per 10⁷ non-target DNAs.

Suitable samples include but are not limited to saliva, blood, serum,plasma, urine, aspirate, and biopsy samples. Thus, the term “sample”with respect to a patient encompasses blood and other liquid samples ofbiological origin, solid tissue samples such as a biopsy specimen ortissue cultures or cells derived therefrom and the progeny thereof. Thedefinition also includes samples that have been manipulated in any wayafter their procurement, such as by treatment with reagents; washed; orenrichment for certain cell populations, such as cancer cells. Thedefinition also includes sample that have been enriched for particulartypes of molecules, e.g., DNAs. The term “sample” encompasses biologicalsamples such as a clinical sample such as blood, plasma, serum,aspirate, cerebral spinal fluid (CSF), and also includes tissue obtainedby surgical resection, tissue obtained by biopsy, cells in culture, cellsupernatants, cell lysates, tissue samples, organs, bone marrow, and thelike. A “biological sample” includes biological fluids derived therefrom(e.g., cancerous cell, infected cell, etc.), e.g., a sample comprisingDNAs that is obtained from such cells (e.g., a cell lysate or other cellextract comprising DNAs).

A sample can comprise, or can be obtained from, any of a variety ofcells, tissues, organs, or acellular fluids. Suitable sample sourcesinclude eukaryotic cells, bacterial cells, and archaeal cells. Suitablesample sources include single-celled organisms and multi-cellularorganisms. Suitable sample sources include single-cell eukaryoticorganisms; a plant or a plant cell; an algal cell, e.g., Botryococcusbraunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Sargassumpatens, C. agardh, and the like; a fungal cell (e.g., a yeast cell); ananimal cell, tissue, or organ; a cell, tissue, or organ from aninvertebrate animal (e.g. fruit fly, cnidarian, echinoderm, nematode, aninsect, an arachnid, etc.); a cell, tissue, fluid, or organ from avertebrate animal (e.g., fish, amphibian, reptile, bird, mammal); acell, tissue, fluid, or organ from a mammal (e.g., a human; a non-humanprimate; an ungulate; a feline; a bovine; an ovine; a caprine; etc.).Suitable sample sources include nematodes, protozoans, and the like.Suitable sample sources include parasites such as helminths, malarialparasites, etc.

Suitable sample sources include a cell, tissue, or organism of any ofthe six kingdoms, e.g., Bacteria (e.g., Eubacteria); Archaebacteria;Protista; Fungi; Plantae; and Animalia. Suitable sample sources includeplant-like members of the kingdom Protista, including, but not limitedto, algae (e.g., green algae, red algae, glaucophytes, cyanobacteria);fungus-like members of Protista, e.g., slime molds, water molds, etc.;animal-like members of Protista, e.g., flagellates (e.g., Euglena),amoeboids (e.g., amoeba), sporozoans (e.g., Apicomplexa, Myxozoa,Microsporidia), and ciliates (e.g., Paramecium). Suitable sample sourcesinclude members of the kingdom Fungi, including, but not limited to,members of any of the phyla: Basidiomycota (club fungi; e.g., members ofAgaricus, Amanita, Boletus, Cantherellus, etc.); Ascomycota (sac fungi,including, e.g., Saccharomyces); Mycophycophyta (lichens); Zygomycota(conjugation fungi); and Deuteromycota. Suitable sample sources includemembers of the kingdom Plantae, including, but not limited to, membersof any of the following divisions: Bryophyta (e.g., mosses),Anthocerotophyta (e.g., hornworts), Hepaticophyta (e.g., liverworts),Lycophyta (e.g., club mosses), Sphenophyta (e.g., horsetails),Psilophyta (e.g., whisk ferns), Ophioglossophyta, Pterophyta (e.g.,ferns), Cycadophyta, Gingkophyta, Pinophyta, Gnetophyta, andMagnoliophyta (e.g., flowering plants). Suitable sample sources includemembers of the kingdom Animalia, including, but not limited to, membersof any of the following phyla: Porifera (sponges); Placozoa;Orthonectida (parasites of marine invertebrates); Rhombozoa; Cnidaria(corals, anemones, jellyfish, sea pens, sea pansies, sea wasps);Ctenophora (comb jellies); Platyhelminthes (flatworms); Nemertina(ribbon worms); Ngathostomulida (jawed worms)p Gastrotricha; Rotifera;Priapulida; Kinorhyncha; Loricifera; Acanthocephala; Entoprocta;Nemotoda; Nematomorpha; Cycliophora; Mollusca (mollusks); Sipuncula(peanut worms); Annelida (segmented worms); Tardigrada (water bears);Onychophora (velvet worms); Arthropoda (including the subphyla:Chelicerata, Myriapoda, Hexapoda, and Crustacea, where the Cheliceratainclude, e.g., arachnids, Merostomata, and Pycnogonida, where theMyriapoda include, e.g., Chilopoda (centipedes), Diplopoda (millipedes),Paropoda, and Symphyla, where the Hexapoda include insects, and wherethe Crustacea include shrimp, krill, barnacles, etc.; Phoronida;Ectoprocta (moss animals); Brachiopoda; Echinodermata (e.g. starfish,sea daisies, feather stars, sea urchins, sea cucumbers, brittle stars,brittle baskets, etc.); Chaetognatha (arrow worms); Hemichordata (acornworms); and Chordata. Suitable members of Chordata include any member ofthe following subphyla: Urochordata (sea squirts; including Ascidiacea,Thaliacea, and Larvacea); Cephalochordata (lancelets); Myxini (hagfish);and Vertebrata, where members of Vertebrata include, e.g., members ofPetromyzontida (lampreys), Chondrichthyces (cartilaginous fish),Actinopterygii (ray-finned fish), Actinista (coelocanths), Dipnoi(lungfish), Reptilia (reptiles, e.g., snakes, alligators, crocodiles,lizards, etc.), Aves (birds); and Mammalian (mammals). Suitable plantsinclude any monocotyledon and any dicotyledon.

Suitable sources of a sample include cells, fluid, tissue, or organtaken from an organism; from a particular cell or group of cellsisolated from an organism; etc. For example, where the organism is aplant, suitable sources include xylem, the phloem, the cambium layer,leaves, roots, etc. Where the organism is an animal, suitable sourcesinclude particular tissues (e.g., lung, liver, heart, kidney, brain,spleen, skin, fetal tissue, etc.), or a particular cell type (e.g.,neuronal cells, epithelial cells, endothelial cells, astrocytes,macrophages, glial cells, islet cells, T lymphocytes, B lymphocytes,etc.).

In some cases, the source of the sample is a (or is suspected of being adiseased cell, fluid, tissue, or organ. In some cases, the source of thesample is a normal (non-diseased) cell, fluid, tissue, or organ. In somecases, the source of the sample is a (or is suspected of being apathogen-infected cell, tissue, or organ. For example, the source of asample can be an individual who may or may not be infected—and thesample could be any biological sample (e.g., blood, saliva, biopsy,plasma, serum, bronchoalveolar lavage, sputum, a fecal sample,cerebrospinal fluid, a fine needle aspirate, a swab sample (e.g., abuccal swab, a cervical swab, a nasal swab), interstitial fluid,synovial fluid, nasal discharge, tears, buffy coat, a mucous membranesample, an epithelial cell sample (e.g., epithelial cell scraping),etc.) collected from the individual. In some cases, the sample is acell-free liquid sample. In some cases, the sample is a liquid samplethat can comprise cells. Pathogens include viruses, fungi, helminths,protozoa, malarial parasites, Plasmodium parasites, Toxoplasmaparasites, Schistosoma parasites, and the like. “Helminths” includeroundworms, heartworms, and phytophagous nematodes (Nematoda), flukes(Tematoda), Acanthocephala, and tapeworms (Cestoda). Protozoaninfections include infections from Giardia spp., Trichomonas spp.,African trypanosomiasis, amoebic dysentery, babesiosis, balantidialdysentery, Chaga's disease, coccidiosis, malaria and toxoplasmosis.Examples of pathogens such as parasitic/protozoan pathogens include, butare not limited to: Plasmodium falciparum, Plasmodium vivax, Trypanosomacruzi and Toxoplasma gondii. Fungal pathogens include, but are notlimited to: Cryptococcus neoformans, Histoplasma capsulatum,Coccidioides immitis, Blastomyces dermatitidis, Chlamydia trachomatis,and Candida albicans. Pathogenic viruses include, e.g., immunodeficiencyvirus (e.g., HIV); influenza virus; dengue; West Nile virus; herpesvirus; yellow fever virus; Hepatitis Virus C; Hepatitis Virus A;Hepatitis Virus B; papillomavirus; and the like. Pathogenic viruses caninclude DNA viruses such as: a papovavirus (e.g., human papillomavirus(HPV), polyomavirus); a hepadnavirus (e.g., Hepatitis B Virus (HBV)); aherpesvirus (e.g., herpes simplex virus (HSV), varicella zoster virus(VZV), epstein-barr virus (EBV), cytomegalovirus (CMV), herpeslymphotropic virus, Pityriasis Rosea, kaposi's sarcoma-associatedherpesvirus); an adenovirus (e.g., atadenovirus, aviadenovirus,ichtadenovirus, mastadenovirus, siadenovirus); a poxvirus (e.g.,smallpox, vaccinia virus, cowpox virus, monkeypox virus, orf virus,pseudocowpox, bovine papular stomatitis virus; tanapox virus, yabamonkey tumor virus; molluscum contagiosum virus (MCV)); a parvovirus(e.g., adeno-associated virus (AAV), Parvovirus B19, human bocavirus,bufavirus, human parv4 G1); Geminiviridae; Nanoviridae; Phycodnaviridae;and the like. Pathogens can include, e.g., DNAviruses [e.g.: apapovavirus (e.g., human papillomavirus (HPV), polyomavirus); ahepadnavirus (e.g., Hepatitis B Virus (HBV)); a herpesvirus (e.g.,herpes simplex virus (HSV), varicella zoster virus (VZV), Epstein-barrvirus (EBV), cytomegalovirus (CMV), herpes lymphotropic virus,Pityriasis Rosea, kaposi's sarcoma-associated herpesvirus); anadenovirus (e.g., atadenovirus, aviadenovirus, ichtadenovirus,mastadenovirus, siadenovirus); a poxvirus (e.g., smallpox, vacciniavirus, cowpox virus, monkeypox virus, orf virus, pseudocowpox, bovinepapular stomatitis virus; tanapox virus, yaba monkey tumor virus;molluscum contagiosum virus (MCV)); a parvovirus (e.g., adeno-associatedvirus (AAV), Parvovirus B19, human bocavirus, bufavirus, human parv4G1); Geminiviridae; Nanoviridae; Phycodnaviridae; and the like],Mycobacterium tuberculosis, Streptococcus agalactiae,methicillin-resistant Staphylococcus aureus, Legionella pneumophila,Streptococcus pyogenes, Escherichia coli, Neisseria gonorrhoeae,Neisseria meningitidis, Pneumococcus, Cryptococcus neoformans,Histoplasma capsulatum, Hemophilus influenzae B, Treponema pallidum,Lyme disease spirochetes, Pseudomonas aeruginosa, Mycobacterium leprae,Brucella abortus, rabies virus, influenza virus, cytomegalovirus, herpessimplex virus I, herpes simplex virus II, human serum parvo-like virus,respiratory syncytial virus, varicella-zoster virus, hepatitis B virus,hepatitis C virus, measles virus, adenovirus, human T-cell leukemiaviruses, Epstein-Barr virus, murine leukemia virus, mumps virus,vesicular stomatitis virus, Sindbis virus, lymphocytic choriomeningitisvirus, wart virus, blue tongue virus, Sendai virus, feline leukemiavirus, Reovirus, polio virus, simian virus 40, mouse mammary tumorvirus, dengue virus, rubella virus, West Nile virus, Plasmodiumfalciparum, Plasmodium vivax, Toxoplasma gondii, Trypanosoma rangeli,Trypanosoma cruzi, Trypanosoma rhodesiense, Trypanosoma brucei,Schistosoma mansoni, Schistosoma japonicum, Babesia bovis, Eimeriatenella, Onchocerca volvulus, Leishmania tropica, Mycobacteriumtuberculosis, Trichinella spiralis, Theileria parva, Taenia hydatigena,Taenia ovis, Taenia saginata, Echinococcus granulosus, Mesocestoidescorti, Mycoplasma arthritidis, M. hyorhinis, M. orale, M. arginini,Acholeplasma laidlawii, M. salivarium and M. pneumoniae.

Measuring a Detectable Signal

In some cases, a subject method includes a step of measuring (e.g.,measuring a detectable signal produced by CasZ-mediated ssDNA cleavage).Because a CasZ polypeptide cleaves non-targeted ssDNA once activated,which occurs when a guide RNA hybridizes with a target DNA in thepresence of a CasZ polypeptide (and, in some cases, also including atranc RNA), a detectable signal can be any signal that is produced whenssDNA is cleaved. For example, in some cases the step of measuring caninclude one or more of: gold nanoparticle based detection (e.g., see Xuet al., Angew Chem Int Ed Engl. 2007; 46(19):3468-70; and Xia et al.,Proc Natl Acad Sci USA. 2010 Jun. 15; 107(24):10837-41), fluorescencepolarization, colloid phase transition/dispersion (e.g., Baksh et al.,Nature. 2004 Jan. 8; 427(6970):139-41), electrochemical detection,semiconductor-based sensing (e.g., Rothberg et al., Nature. 2011 Jul.20; 475(7356):348-52; e.g., one could use a phosphatase to generate a pHchange after ssDNA cleavage reactions, by opening 2′-3′ cyclicphosphates, and by releasing inorganic phosphate into solution), anddetection of a labeled detector ssDNA (see elsewhere herein for moredetails). The readout of such detection methods can be any convenientreadout. Examples of possible readouts include but are not limited to: ameasured amount of detectable fluorescent signal; a visual analysis ofbands on a gel (e.g., bands that represent cleaved product versusuncleaved substrate), a visual or sensor based detection of the presenceor absence of a color (i.e., color detection method), and the presenceor absence of (or a particular amount of) an electrical signal.

The measuring can in some cases be quantitative, e.g., in the sense thatthe amount of signal detected can be used to determine the amount oftarget DNA present in the sample. The measuring can in some cases bequalitative, e.g., in the sense that the presence or absence ofdetectable signal can indicate the presence or absence of targeted DNA(e.g., virus, SNP, etc.). In some cases, a detectable signal will not bepresent (e.g., above a given threshold level) unless the targeted DNA(s)(e.g., virus, SNP, etc.) is present above a particular thresholdconcentration. In some cases, the threshold of detection can be titratedby modifying the amount of CasZ polypeptide, guide RNA, sample volume,and/or detector ssDNA (if one is used). As such, for example, as wouldbe understood by one of ordinary skill in the art, a number of controlscan be used if desired in order to set up one or more reactions, eachset up to detect a different threshold level of target DNA, and thussuch a series of reactions could be used to determine the amount oftarget DNA present in a sample (e.g., one could use such a series ofreactions to determine that a target DNA is present in the sample ‘at aconcentration of at least X’). Non-limiting examples of applicationsof/uses for the compositions and methods of the disclosure includesingle-nucleotide polymorphism (SNP) detection, cancer screening,detection of bacterial infection, detection of antibiotic resistance,detection of viral infection, and the like. The compositions and methodsof this disclosure can be used to detect any DNA target. For example,any virus that integrates nucleic acid material into the genome can bedetected because a subject sample can include cellular genomic DNA—andthe guide RNA can be designed to detect integrated nucleotide sequence.A method of the present disclosure in some cases does not include anamplification step. A method of the present disclosure in some casesincludes an amplification step.

In some cases, a method of the present disclosure can be used todetermine the amount of a target DNA in a sample (e.g., a samplecomprising the target DNA and a plurality of non-target DNAs).Determining the amount of a target DNA in a sample can comprisecomparing the amount of detectable signal generated from a test sampleto the amount of detectable signal generated from a reference sample.Determining the amount of a target DNA in a sample can comprise:measuring the detectable signal to generate a test measurement;measuring a detectable signal produced by a reference sample to generatea reference measurement; and comparing the test measurement to thereference measurement to determine an amount of target DNA present inthe sample.

For example, in some cases, a method of the present disclosure fordetermining the amount of a target DNA in a sample comprises: a)contacting the sample (e.g., a sample comprising the target DNA and aplurality of non-target DNAs) with: (i) a guide RNA that hybridizes withthe target DNA, (ii) a CasZ polypeptide that cleaves DNAs present in thesample, and (iii) a detector ssDNA; b) measuring a detectable signalproduced by CasZ polypeptide-mediated ssDNA cleavage (e.g., cleavage ofthe detector ssDNA), generating a test measurement; c) measuring adetectable signal produced by a reference sample to generate a referencemeasurement; and d) comparing the test measurement to the referencemeasurement to determine an amount of target DNA present in the sample.

As another example, in some cases, a method of the present disclosurefor determining the amount of a target DNA in a sample comprises: a)contacting the sample (e.g., a sample comprising the target DNA and aplurality of non-target DNAs) with: (i) a guide RNA that hybridizes withthe target DNA, (ii) a CasZ polypeptide that cleaves DNAs present in thesample, (iii) a tranc RNA; (iv) a detector ssDNA; b) measuring adetectable signal produced by CasZ polypeptide-mediated ssDNA cleavage(e.g., cleavage of the detector ssDNA), generating a test measurement;c) measuring a detectable signal produced by a reference sample togenerate a reference measurement; and d) comparing the test measurementto the reference measurement to determine an amount of target DNApresent in the sample.

Amplification of Nucleic Acids in the Sample

In some embodiments, sensitivity of a subject composition and/or method(e.g., for detecting the presence of a target DNA, such as viral DNA ora SNP, in cellular genomic DNA) can be increased by coupling detectionwith nucleic acid amplification. In some cases, the nucleic acids in asample are amplified prior to contact with a CasZ polypeptide thatcleaves ssDNA (e.g., amplification of nucleic acids in the sample canbegin prior to contact with a CasZ polypeptide). In some cases, thenucleic acids in a sample are amplified simultaneous with contact with aCasZ polypeptide. For example, in some cases a subject method includesamplifying nucleic acids of a sample (e.g., by contacting the samplewith amplification components) prior to contacting the amplified samplewith a CasZ polypeptide. In some cases, a subject method includescontacting a sample with amplification components at the same time(simultaneous with) that the sample is contacted with a CasZpolypeptide. If all components are added simultaneously (amplificationcomponents and detection components such as a CasZ polypeptide, a guideRNA, and a detector DNA), it is possible that the trans-cleavageactivity of the CasZ polypeptide, will begin to degrade the nucleicacids of the sample at the same time the nucleic acids are undergoingamplification. However, even if this is the case, amplifying anddetecting simultaneously can still increase sensitivity compared toperforming the method without amplification.

In some cases, specific sequences (e.g., sequences of a virus, sequencesthat include a SNP of interest) are amplified from the sample, e.g.,using primers. As such, a sequence to which the guide RNA will hybridizecan be amplified in order to increase sensitivity of a subject detectionmethod—this could achieve biased amplification of a desired sequence inorder to increase the number of copies of the sequence of interestpresent in the sample relative to other sequences present in the sample.As one illustrative example, if a subject method is being used todetermine whether a given sample includes a particular virus (or aparticular SNP), a desired region of viral sequence (or non-viralgenomic sequence) can be amplified, and the region amplified willinclude the sequence that would hybridize to the guide RNA if the viralsequence (or SNP) were in fact present in the sample.

As noted, in some cases the nucleic acids are amplified (e.g., bycontact with amplification components) prior to contacting the amplifiednucleic acids with a CasZ polypeptide. In some cases, amplificationoccurs for 10 seconds or more, (e.g., 30 seconds or more, 45 seconds ormore, 1 minute or more, 2 minutes or more, 3 minutes or more, 4 minutesor more, 5 minutes or more, 7.5 minutes or more, 10 minutes or more,etc.) prior to contact with an enzymatically active CasZ polypeptide. Insome cases, amplification occurs for 2 minutes or more (e.g., 3 minutesor more, 4 minutes or more, 5 minutes or more, 7.5 minutes or more, 10minutes or more, etc.) prior to contact with an active CasZ polypeptide.In some cases, amplification occurs for a period of time in a range offrom 10 seconds to 60 minutes (e.g., 10 seconds to 40 minutes, 10seconds to 30 minutes, 10 seconds to 20 minutes, 10 seconds to 15minutes, 10 seconds to 10 minutes, 10 seconds to 5 minutes, 30 secondsto 40 minutes, 30 seconds to 30 minutes, 30 seconds to 20 minutes, 30seconds to 15 minutes, 30 seconds to 10 minutes, 30 seconds to 5minutes, 1 minute to 40 minutes, 1 minute to 30 minutes, 1 minute to 20minutes, 1 minute to 15 minutes, 1 minute to 10 minutes, 1 minute to 5minutes, 2 minutes to 40 minutes, 2 minutes to 30 minutes, 2 minutes to20 minutes, 2 minutes to 15 minutes, 2 minutes to 10 minutes, 2 minutesto 5 minutes, 5 minutes to 40 minutes, 5 minutes to 30 minutes, 5minutes to 20 minutes, 5 minutes to 15 minutes, or 5 minutes to 10minutes). In some cases, amplification occurs for a period of time in arange of from 5 minutes to 15 minutes. In some cases, amplificationoccurs for a period of time in a range of from 7 minutes to 12 minutes.

In some cases, a sample is contacted with amplification components atthe same time as contact with a CasZ polypeptide. In some such cases,the CasZ polypeptide is inactive at the time of contact and is activatedonce nucleic acids in the sample have been amplified.

Various amplification methods and components will be known to one ofordinary skill in the art and any convenient method can be used (see,e.g., Zanoli and Spoto, Biosensors (Basel). 2013 March; 3(1): 18-43;Gill and Ghaemi, Nucleosides, Nucleotides, and Nucleic Acids, 2008, 27:224-243; Craw and Balachandrana, Lab Chip, 2012, 12, 2469-2486; whichare herein incorporated by reference in their entirety). Nucleic acidamplification can comprise polymerase chain reaction (PCR), reversetranscription PCR (RT-PCR), quantitative PCR (qPCR), reversetranscription qPCR (RT-qPCR), nested PCR, multiplex PCR, asymmetric PCR,touchdown PCR, random primer PCR, hemi-nested PCR, polymerase cyclingassembly (PCA), colony PCR, ligase chain reaction (LCR), digital PCR,methylation specific-PCR (MSP), co-amplification at lower denaturationtemperature-PCR (COLD-PCR), allele-specific PCR, intersequence-specificPCR (ISS-PCR), whole genome amplification (WGA), inverse PCR, andthermal asymmetric interlaced PCR (TAIL-PCR).

In some cases, the amplification is isothermal amplification. The term“isothermal amplification” indicates a method of nucleic acid (e.g.,DNA) amplification (e.g., using enzymatic chain reaction) that can use asingle temperature incubation thereby obviating the need for a thermalcycler. Isothermal amplification is a form of nucleic acid amplificationwhich does not rely on the thermal denaturation of the target nucleicacid during the amplification reaction and hence may not requiremultiple rapid changes in temperature. Isothermal nucleic acidamplification methods can therefore be carried out inside or outside ofa laboratory environment. By combining with a reverse transcriptionstep, these amplification methods can be used to isothermally amplifyRNA.

Examples of isothermal amplification methods include but are not limitedto: loop-mediated isothermal Amplification (LAMP), helicase-dependentAmplification (HDA), recombinase polymerase amplification (RPA), stranddisplacement amplification (SDA), nucleic acid sequence-basedamplification (NASBA), transcription mediated amplification (TMA),nicking enzyme amplification reaction (NEAR), rolling circleamplification (RCA), multiple displacement amplification (MDA),Ramification (RAM), circular helicase-dependent amplification (cHDA),single primer isothermal amplification (SPIA), signal mediatedamplification of RNA technology (SMART), self-sustained sequencereplication (3SR), genome exponential amplification reaction (GEAR) andisothermal multiple displacement amplification (IMDA).

In some cases, the amplification is recombinase polymerase amplification(RPA) (see, e.g., U.S. Pat. Nos. 8,030,000; 8,426,134; 8,945,845;9,309,502; and 9,663,820, which are hereby incorporated by reference intheir entirety). Recombinase polymerase amplification (RPA) uses twoopposing primers (much like PCR) and employs three enzymes—arecombinase, a single-stranded DNA-binding protein (SSB) and astrand-displacing polymerase. The recombinase pairs oligonucleotideprimers with homologous sequence in duplex DNA, SSB binds to displacedstrands of DNA to prevent the primers from being displaced, and thestrand displacing polymerase begins DNA synthesis where the primer hasbound to the target DNA. Adding a reverse transcriptase enzyme to an RPAreaction can facilitate detection RNA as well as DNA, without the needfor a separate step to produce cDNA. One example of components for anRPA reaction is as follows (see, e.g., U.S. Pat. Nos. 8,030,000;8,426,134; 8,945,845; 9,309,502; 9,663,820): 50 mM Tris pH 8.4, 80 mMPotassium actetate, 10 mM Magnesium acetate, 2 mM DTT, 5% PEG compound(Carbowax-20M), 3 mM ATP, 30 mM Phosphocreatine, 100 ng/μl creatinekinase, 420 ng/μl gp32, 140 ng/μl UvsX, 35 ng/μl UvsY, 2000M dNTPs, 300nM each oligonucleotide, 35 ng/μl Bsu polymerase, and a nucleicacid-containing sample).

In a transcription mediated amplification (TMA), an RNA polymerase isused to make RNA from a promoter engineered in the primer region, andthen a reverse transcriptase synthesizes cDNA from the primer. A thirdenzyme, e.g., Rnase H can then be used to degrade the RNA target fromcDNA without the heat-denatured step. This amplification technique issimilar to Self-Sustained Sequence Replication (3SR) and Nucleic AcidSequence Based Amplification (NASBA), but varies in the enzymesemployed. For another example, helicase-dependent amplification (HDA)utilizes a thermostable helicase (Tte-UvrD) rather than heat to unwinddsDNA to create single-strands that are then available for hybridizationand extension of primers by polymerase. For yet another example, a loopmediated amplification (LAMP) employs a thermostable polymerase withstrand displacement capabilities and a set of four or more specificdesigned primers. Each primer is designed to have hairpin ends that,once displaced, snap into a hairpin to facilitate self-priming andfurther polymerase extension. In a LAMP reaction, though the reactionproceeds under isothermal conditions, an initial heat denaturation stepis required for double-stranded targets. In addition, amplificationyields a ladder pattern of various length products. For yet anotherexample, a strand displacement amplification (SDA) combines the abilityof a restriction endonuclease to nick the unmodified strand of itstarget DNA and an exonuclease-deficient DNA polymerase to extend the 3′end at the nick and displace the downstream DNA strand.

Detector DNA

In some cases, a subject method includes contacting a sample (e.g., asample comprising a target DNA and a plurality of non-target ssDNAs)with: i) a CasZ polypeptide; ii) a guide RNA; and iii) a detector DNAthat is single stranded and does not hybridize with the guide sequenceof the guide RNA.

A suitable single-stranded detector DNA has a length of from 7nucleotides to 25 nucleotides. For example, a suitable single-strandeddetector DNA has a length of from 7 nucleotides to 10 nucleotides, from11 nucleotides to 15 nucleotides, from 15 nucleotides to 20 nucleotides,or from 20 nucleotides to 25 nucleotides. In some cases, a suitablesingle-stranded detector DNA has a length of from 10 nucleotides to 15nucleotides. In some cases, a suitable single-stranded detector DNA hasa length of 10 nucleotides. In some cases, a suitable single-strandeddetector DNA has a length of 11 nucleotides. In some cases, a suitablesingle-stranded detector DNA has a length of 12 nucleotides. In somecases, a suitable single-stranded detector DNA has a length of 13nucleotides. In some cases, a suitable single-stranded detector DNA hasa length of 14 nucleotides. In some cases, a suitable single-strandeddetector DNA has a length of 15 nucleotides.

In some cases, a subject method includes: a) contacting a sample with alabeled single stranded detector DNA (detector ssDNA) that includes afluorescence-emitting dye pair; a CasZ polypeptide that cleaves thelabeled detector ssDNA after it is activated (by binding to the guideRNA in the context of the guide RNA hybridizing to a target DNA); and b)measuring the detectable signal that is produced by thefluorescence-emitting dye pair. For example, in some cases, a subjectmethod includes contacting a sample with a labeled detector ssDNAcomprising a fluorescence resonance energy transfer (FRET) pair or aquencher/fluor pair, or both. In some cases, a subject method includescontacting a sample with a labeled detector ssDNA comprising a FRETpair. In some cases, a subject method includes contacting a sample witha labeled detector ssDNA comprising a fluor/quencher pair.

Fluorescence-emitting dye pairs comprise a FRET pair or a quencher/fluorpair. In both cases of a FRET pair and a quencher/fluor pair, theemission spectrum of one of the dyes overlaps a region of the absorptionspectrum of the other dye in the pair. As used herein, the term“fluorescence-emitting dye pair” is a generic term used to encompassboth a “fluorescence resonance energy transfer (FRET) pair” and a“quencher/fluor pair,” both of which terms are discussed in more detailbelow. The term “fluorescence-emitting dye pair” is used interchangeablywith the phrase “a FRET pair and/or a quencher/fluor pair.”

In some cases (e.g., when the detector ssDNA includes a FRET pair) thelabeled detector ssDNA produces an amount of detectable signal prior tobeing cleaved, and the amount of detectable signal that is measured isreduced when the labeled detector ssDNA is cleaved. In some cases, thelabeled detector ssDNA produces a first detectable signal prior to beingcleaved (e.g., from a FRET pair) and a second detectable signal when thelabeled detector ssDNA is cleaved (e.g., from a quencher/fluor pair). Assuch, in some cases, the labeled detector ssDNA comprises a FRET pairand a quencher/fluor pair.

In some cases, the labeled detector ssDNA comprises a FRET pair. FRET isa process by which radiationless transfer of energy occurs from anexcited state fluorophore to a second chromophore in close proximity.The range over which the energy transfer can take place is limited toapproximately 10 nanometers (100 angstroms), and the efficiency oftransfer is extremely sensitive to the separation distance betweenfluorophores. Thus, as used herein, the term “FRET” (“fluorescenceresonance energy transfer”; also known as “Forster resonance energytransfer”) refers to a physical phenomenon involving a donor fluorophoreand a matching acceptor fluorophore selected so that the emissionspectrum of the donor overlaps the excitation spectrum of the acceptor,and further selected so that when donor and acceptor are in closeproximity (usually 10 nm or less) to one another, excitation of thedonor will cause excitation of and emission from the acceptor, as someof the energy passes from donor to acceptor via a quantum couplingeffect. Thus, a FRET signal serves as a proximity gauge of the donor andacceptor; only when they are in close proximity to one another is asignal generated. The FRET donor moiety (e.g., donor fluorophore) andFRET acceptor moiety (e.g., acceptor fluorophore) are collectivelyreferred to herein as a “FRET pair”.

The donor-acceptor pair (a FRET donor moiety and a FRET acceptor moiety)is referred to herein as a “FRET pair” or a “signal FRET pair.” Thus, insome cases, a subject labeled detector ssDNA includes two signalpartners (a signal pair), when one signal partner is a FRET donor moietyand the other signal partner is a FRET acceptor moiety. A subjectlabeled detector ssDNA that includes such a FRET pair (a FRET donormoiety and a FRET acceptor moiety) will thus exhibit a detectable signal(a FRET signal) when the signal partners are in close proximity (e.g.,while on the same RNA molecule), but the signal will be reduced (orabsent) when the partners are separated (e.g., after cleavage of the RNAmolecule by a CasZ polypeptide).

FRET donor and acceptor moieties (FRET pairs) will be known to one ofordinary skill in the art and any convenient FRET pair (e.g., anyconvenient donor and acceptor moiety pair) can be used. Examples ofsuitable FRET pairs include but are not limited to those presented inTable 1. See also: Bajar et al. Sensors (Basel). 2016 Sep. 14; 16(9);and Abraham et al. PLoS One. 2015 Aug. 3; 10(8):e0134436.

TABLE 6 Examples of FRET pairs (donor and acceptor FRET moieties) DonorAcceptor Tryptophan Dansyl IAEDANS (1) DDPM (2) BFP DsRFP DansylFluorescein isothiocyanate (FITC) Dansyl Octadecylrhodamine Cyanfluorescent Green fluorescent protein protein (CFP) (GFP) CF (3) TexasRed Fluorescein Tetramethylrhodamine Cy3 Cy5 GFP Yellow fluorescentprotein (YFP) BODIPY FL (4) BODIPY FL (4) Rhodamine 110 Cy3 Rhodamine 6GMalachite Green FITC Eosin Thiosemicarbazide B-Phycoerythrin Cy5 Cy5Cy5.5 (1) 5-(2-iodoacetylaminoethyl)aminonaphthalene-1-sulfonic acid (2)N-(4-dimethylamino-3,5-dinitrophenyl)maleimide (3) carboxyfluoresceinsuccinimidyl ester (4) 4,4-difluoro-4-bora-3a,4a-diaza-s-indacene

In some cases, a detectable signal is produced when the labeled detectorssDNA is cleaved (e.g., in some cases, the labeled detector ssDNAcomprises a quencher/fluor pair). One signal partner of a signalquenching pair produces a detectable signal and the other signal partneris a quencher moiety that quenches the detectable signal of the firstsignal partner (i.e., the quencher moiety quenches the signal of thesignal moiety such that the signal from the signal moiety is reduced(quenched) when the signal partners are in proximity to one another,e.g., when the signal partners of the signal pair are in closeproximity).

For example, in some cases, an amount of detectable signal increaseswhen the labeled detector ssDNA is cleaved. For example, in some cases,the signal exhibited by one signal partner (a signal moiety) is quenchedby the other signal partner (a quencher signal moiety), e.g., when bothare present on the same ssDNA molecule prior to cleavage by a CasZpolypeptide. Such a signal pair is referred to herein as a“quencher/fluor pair”, “quenching pair”, or “signal quenching pair.” Forexample, in some cases, one signal partner (e.g., the first signalpartner) is a signal moiety that produces a detectable signal that isquenched by the second signal partner (e.g., a quencher moiety). Thesignal partners of such a quencher/fluor pair will thus produce adetectable signal when the partners are separated (e.g., after cleavageof the detector ssDNA by a CasZ polypeptide), but the signal will bequenched when the partners are in close proximity (e.g., prior tocleavage of the detector ssDNA by a CasZ polypeptide).

A quencher moiety can quench a signal from the signal moiety (e.g.,prior to cleavage of the detector ssDNA by a CasZ polypeptide) tovarious degrees. In some cases, a quencher moiety quenches the signalfrom the signal moiety where the signal detected in the presence of thequencher moiety (when the signal partners are in proximity to oneanother) is 95% or less of the signal detected in the absence of thequencher moiety (when the signal partners are separated). For example,in some cases, the signal detected in the presence of the quenchermoiety can be 90% or less, 80% or less, 70% or less, 60% or less, 50% orless, 40% or less, 30% or less, 20% or less, 15% or less, 10% or less,or 5% or less of the signal detected in the absence of the quenchermoiety. In some cases, no signal (e.g., above background) is detected inthe presence of the quencher moiety.

In some cases, the signal detected in the absence of the quencher moiety(when the signal partners are separated) is at least 1.2 fold greater(e.g., at least 1.3 fold, at least 1.5 fold, at least 1.7 fold, at least2 fold, at least 2.5 fold, at least 3 fold, at least 3.5 fold, at least4 fold, at least 5 fold, at least 7 fold, at least 10 fold, at least 20fold, or at least 50 fold greater) than the signal detected in thepresence of the quencher moiety (when the signal partners are inproximity to one another).

In some cases, the signal moiety is a fluorescent label. In some suchcases, the quencher moiety quenches the signal (the light signal) fromthe fluorescent label (e.g., by absorbing energy in the emission spectraof the label). Thus, when the quencher moiety is not in proximity withthe signal moiety, the emission (the signal) from the fluorescent labelis detectable because the signal is not absorbed by the quencher moiety.Any convenient donor acceptor pair (signal moiety/quencher moiety pair)can be used and many suitable pairs are known in the art.

In some cases, the quencher moiety absorbs energy from the signal moiety(also referred to herein as a “detectable label”) and then emits asignal (e.g., light at a different wavelength). Thus, in some cases, thequencher moiety is itself a signal moiety (e.g., a signal moiety can be6-carboxyfluorescein while the quencher moiety can be6-carboxy-tetramethylrhodamine), and in some such cases, the pair couldalso be a FRET pair. In some cases, a quencher moiety is a darkquencher. A dark quencher can absorb excitation energy and dissipate theenergy in a different way (e.g., as heat). Thus, a dark quencher hasminimal to no fluorescence of its own (does not emit fluorescence).Examples of dark quenchers are further described in U.S. Pat. Nos.8,822,673 and 8,586,718; U.S. patent publications 20140378330,20140349295, and 20140194611; and international patent applications:WO200142505 and WO200186001, all if which are hereby incorporated byreference in their entirety.

Examples of fluorescent labels include, but are not limited to: an AlexaFluor® dye, an ATTO dye (e.g., ATTO 390, ATTO 425, ATTO 465, ATTO 488,ATTO 495, ATTO 514, ATTO 520, ATTO 532, ATTO Rho6G, ATTO 542, ATTO 550,ATTO 565, ATTO Rho3B, ATTO Rho11, ATTO Rho12, ATTO Thio12, ATTO Rho101,ATTO 590, ATTO 594, ATTO Rho13, ATTO 610, ATTO 620, ATTO Rho14, ATTO633, ATTO 647, ATTO 647N, ATTO 655, ATTO Oxa12, ATTO 665, ATTO 680, ATTO700, ATTO 725, ATTO 740), a DyLight dye, a cyanine dye (e.g., Cy2, Cy3,Cy3.5, Cy3b, Cy5, Cy5.5, Cy7, Cy7.5), a FluoProbes dye, a Sulfo Cy dye,a Seta dye, an IRIS Dye, a SeTau dye, an SRfluor dye, a Square dye,fluorescein isothiocyanate (FITC), tetramethylrhodamine (TRITC), TexasRed, Oregon Green, Pacific Blue, Pacific Green, Pacific Orange, quantumdots, and a tethered fluorescent protein.

In some cases, a detectable label is a fluorescent label selected from:an Alexa Fluor® dye, an ATTO dye (e.g., ATTO 390, ATTO 425, ATTO 465,ATTO 488, ATTO 495, ATTO 514, ATTO 520, ATTO 532, ATTO Rho6G, ATTO 542,ATTO 550, ATTO 565, ATTO Rho3B, ATTO Rho11, ATTO Rho12, ATTO Thio12,ATTO Rho101, ATTO 590, ATTO 594, ATTO Rho13, ATTO 610, ATTO 620, ATTORho14, ATTO 633, ATTO 647, ATTO 647N, ATTO 655, ATTO Oxa12, ATTO 665,ATTO 680, ATTO 700, ATTO 725, ATTO 740), a DyLight dye, a cyanine dye(e.g., Cy2, Cy3, Cy3.5, Cy3b, Cy5, Cy5.5, Cy7, Cy7.5), a FluoProbes dye,a Sulfo Cy dye, a Seta dye, an IRIS Dye, a SeTau dye, an SRfluor dye, aSquare dye, fluorescein (FITC), tetramethylrhodamine (TRITC), Texas Red,Oregon Green, Pacific Blue, Pacific Green, and Pacific Orange.

In some cases, a detectable label is a fluorescent label selected from:an Alexa Fluor® dye, an ATTO dye (e.g., ATTO 390, ATTO 425, ATTO 465,ATTO 488, ATTO 495, ATTO 514, ATTO 520, ATTO 532, ATTO Rho6G, ATTO 542,ATTO 550, ATTO 565, ATTO Rho3B, ATTO Rho11, ATTO Rho12, ATTO Thio12,ATTO Rho101, ATTO 590, ATTO 594, ATTO Rho13, ATTO 610, ATTO 620, ATTORho14, ATTO 633, ATTO 647, ATTO 647N, ATTO 655, ATTO Oxa12, ATTO 665,ATTO 680, ATTO 700, ATTO 725, ATTO 740), a DyLight dye, a cyanine dye(e.g., Cy2, Cy3, Cy3.5, Cy3b, Cy5, Cy5.5, Cy7, Cy7.5), a FluoProbes dye,a Sulfo Cy dye, a Seta dye, an IRIS Dye, a SeTau dye, an SRfluor dye, aSquare dye, fluorescein (FITC), tetramethylrhodamine (TRITC), Texas Red,Oregon Green, Pacific Blue, Pacific Green, Pacific Orange, a quantumdot, and a tethered fluorescent protein.

Examples of ATTO dyes include, but are not limited to: ATTO 390, ATTO425, ATTO 465, ATTO 488, ATTO 495, ATTO 514, ATTO 520, ATTO 532, ATTORho6G, ATTO 542, ATTO 550, ATTO 565, ATTO Rho3B, ATTO Rho11, ATTO Rho12,ATTO Thio12, ATTO Rho101, ATTO 590, ATTO 594, ATTO Rho13, ATTO 610, ATTO620, ATTO Rho14, ATTO 633, ATTO 647, ATTO 647N, ATTO 655, ATTO Oxa12,ATTO 665, ATTO 680, ATTO 700, ATTO 725, and ATTO 740.

Examples of AlexaFluor dyes include, but are not limited to: AlexaFluor® 350, Alexa Fluor® 405, Alexa Fluor® 430, Alexa Fluor® 488, AlexaFluor® 500, Alexa Fluor® 514, Alexa Fluor® 532, Alexa Fluor® 546, AlexaFluor® 555, Alexa Fluor® 568, Alexa Fluor® 594, Alexa Fluor® 610, AlexaFluor® 633, Alexa Fluor® 635, Alexa Fluor® 647, Alexa Fluor® 660, AlexaFluor® 680, Alexa Fluor® 700, Alexa Fluor® 750, Alexa Fluor® 790, andthe like.

Examples of quencher moieties include, but are not limited to: a darkquencher, a Black Hole Quencher® (BHQ®) (e.g., BHQ-0, BHQ-1, BHQ-2,BHQ-3), a Qxl quencher, an ATTO quencher (e.g., ATTO 540Q, ATTO 580Q,and ATTO 612Q), dimethylaminoazobenzenesulfonic acid (Dabsyl), IowaBlack RQ, Iowa Black FQ, IRDye QC-1, a QSY dye (e.g., QSY 7, QSY 9, QSY21), AbsoluteQuencher, Eclipse, and metal clusters such as goldnanoparticles, and the like.

In some cases, a quencher moiety is selected from: a dark quencher, aBlack Hole Quencher® (BHQ®) (e.g., BHQ-0, BHQ-1, BHQ-2, BHQ-3), a Qxlquencher, an ATTO quencher (e.g., ATTO 540Q, ATTO 580Q, and ATTO 612Q),dimethylaminoazobenzenesulfonic acid (Dabsyl), Iowa Black RQ, Iowa BlackFQ, IRDye QC-1, a QSY dye (e.g., QSY 7, QSY 9, QSY 21),AbsoluteQuencher, Eclipse, and a metal cluster.

Examples of an ATTO quencher include, but are not limited to: ATTO 540Q,ATTO 580Q, and ATTO 612Q. Examples of a Black Hole Quencher® (BHQ®)include, but are not limited to: BHQ-0 (493 nm), BHQ-1 (534 nm), BHQ-2(579 nm) and BHQ-3 (672 nm).

For examples of some detectable labels (e.g., fluorescent dyes) and/orquencher moieties, see, e.g., Bao et al., Annu Rev Biomed Eng. 2009;11:25-47; as well as U.S. Pat. Nos. 8,822,673 and 8,586,718; U.S. patentpublications 20140378330, 20140349295, 20140194611, 20130323851,20130224871, 20110223677, 20110190486, 20110172420, 20060179585 and20030003486; and international patent applications: WO200142505 andWO200186001, all of which are hereby incorporated by reference in theirentirety.

In some cases, cleavage of a labeled detector ssDNA can be detected bymeasuring a colorimetric read-out. For example, the liberation of afluorophore (e.g., liberation from a FRET pair, liberation from aquencher/fluor pair, and the like) can result in a wavelength shift (andthus color shift) of a detectable signal. Thus, in some cases, cleavageof a subject labeled detector ssDNA can be detected by a color-shift.Such a shift can be expressed as a loss of an amount of signal of onecolor (wavelength), a gain in the amount of another color, a change inthe ration of one color to another, and the like.

Kits for Detecting Target DNA

The present disclosure provides a kit for detecting a target DNA, e.g.,in a sample comprising a plurality of DNAs. In some cases, the kitcomprises: (a) a labeled detector ssDNA (e.g., a labeled detector ssDNAcomprising a fluorescence-emitting dye pair, e.g., a FRET pair and/or aquencher/fluor pair); and (b) one or more of: (i) a guide RNA, and/or anucleic acid encoding said guide RNA; and ii) a CasZ polypeptide, and/ora nucleic acid encoding said CasZ polypeptide. In some cases, a nucleicacid encoding a guide RNA includes sequence insertion sites for theinsertion of guide sequences by a user.

In some cases, the kit comprises: (a) a labeled detector ssDNA (e.g., alabeled detector ssDNA comprising a fluorescence-emitting dye pair,e.g., a FRET pair and/or a quencher/fluor pair); and (b) one or more of:(i) a guide RNA, and/or a nucleic acid encoding said guide RNA; ii) atranc RNA and/or a nucleic acid encoding said guide RNA; and iii) a CasZpolypeptide, and/or a nucleic acid encoding said CasZ polypeptide. Insome cases, a nucleic acid encoding a guide RNA includes sequenceinsertion sites for the insertion of guide sequences by a user.

In some cases, the kit comprises: (a) a labeled detector ssDNA (e.g., alabeled detector ssDNA comprising a fluorescence-emitting dye pair,e.g., a FRET pair and/or a quencher/fluor pair); and (b) one or more of:(i) a single-molecule RNA comprising a guide RNA and a tranc RNA, and/ora nucleic acid encoding single-molecule RNA; and iii) a CasZpolypeptide, and/or a nucleic acid encoding said CasZ polypeptide. Insome cases, a nucleic acid encoding a single-molecule RNA includessequence insertion sites for the insertion of guide sequences by a user.

In some cases, a subject kit comprises: (a) a labeled detector ssDNAcomprising a fluorescence-emitting dye pair, e.g., a FRET pair and/or aquencher/fluor pair; and (b) one or more of: (i) a guide RNA, and/or anucleic acid encoding said guide RNA; and/or i) a CasZ polypeptide.

Positive Controls

A kit of the present disclosure (e.g., one that comprises a labeleddetector ssDNA and a CasZ polypeptide) can also include a positivecontrol target DNA. In some cases, the kit also includes a positivecontrol guide RNA that comprises a nucleotide sequence that hybridizesto the control target DNA. In some cases, the positive control targetDNA is provided in various amounts, in separate containers. In somecases, the positive control target DNA is provided in various knownconcentrations, in separate containers, along with control non-targetDNAs.

Nucleic Acids

While the RNAs of the disclosure (e.g., guide RNAs, tranc RNAs,single-molecule RNAs comprising a guide RNA and a tranc RNA) can besynthesized using any convenient method (e.g., chemical synthesis, invitro using an RNA polymerase enzyme, e.g., T7 polymerase, T3polymerase, SP6 polymerase, etc.), nucleic acids encoding such RNAs arealso envisioned. Additionally, while a CasZ polypeptide of thedisclosure can be provided (e.g., as part of a kit) in protein form,nucleic acids (such as mRNA and/or DNA) encoding the CasZ polypeptidecan also be provided.

In some cases, a kit of the present disclosure comprises a nucleic acid(e.g., a DNA, e.g., a recombinant expression vector) that comprises anucleotide sequence encoding a single-molecule RNA comprising: i) aguide RNA; and ii) a tranc RNA. In some cases, the nucleotide sequenceencodes the guide RNA portion of the single-molecule RNA without a guidesequence. For example, in some cases, the nucleic acid comprises anucleotide sequence encoding: i) a constant region of a guide RNA (aguide RNA without a guide sequence), and comprises an insertion site fora nucleic acid encoding a guide sequence; and ii) a tranc RNA.

For example, in some cases, a kit of the present disclosure comprises anucleic acid (e.g., a DNA, e.g., a recombinant expression vector) thatcomprises a nucleotide sequence encoding a guide RNA. In some cases, thenucleotide sequence encodes a guide RNA without a guide sequence. Forexample, in some cases, the nucleic acid comprises a nucleotide sequenceencoding a constant region of a guide RNA (a guide RNA without a guidesequence), and comprises an insertion site for a nucleic acid encoding aguide sequence. In some cases, a kit of the present disclosure comprisesa nucleic acid (e.g., an mRNA, a DNA, e.g., a recombinant expressionvector) that comprises a nucleotide sequence encoding a CasZpolypeptide.

In some cases, the guide RNA-encoding nucleotide sequence is operablylinked to a promoter, e.g., a promoter that is functional in aprokaryotic cell, a promoter that is functional in a eukaryotic cell, apromoter that is functional in a mammalian cell, a promoter that isfunctional in a human cell, and the like. In some cases, a nucleotidesequence encoding a CasZ polypeptide is operably linked to a promoter,e.g., a promoter that is functional in a prokaryotic cell, a promoterthat is functional in a eukaryotic cell, a promoter that is functionalin a mammalian cell, a promoter that is functional in a human cell, acell type-specific promoter, a regulatable promoter, a tissue-specificpromoter, and the like.

Utility

CasZ compositions (e.g., expression vectors, kits, compositions, nucleicacids, and the like) find use in a variety of methods. For example, aCasZ compositions of the present disclosure can be used to (i) modify(e.g., cleave, e.g., nick; methylate; etc.) target nucleic acid (DNA orRNA; single stranded or double stranded); (ii) modulate transcription ofa target nucleic acid; (iii) label a target nucleic acid; (iv) bind atarget nucleic acid (e.g., for purposes of isolation, labeling, imaging,tracking, etc.); (v) modify a polypeptide (e.g., a histone) associatedwith a target nucleic acid; and the like. Thus, the present disclosureprovides a method of modifying a target nucleic acid. In some cases, amethod of the present disclosure for modifying a target nucleic acidcomprises contacting the target nucleic acid with: a) a CasZ polypeptideof the present disclosure; and b) one or more (e.g., two) CasZ guideRNAs. In some cases, a method of the present disclosure for modifying atarget nucleic acid comprises contacting the target nucleic acid with:a) a CasZ polypeptide, and b) one or more (e.g., two) CasZ guide RNAs,and c) a CasZ trancRNA. In some cases, a method of the presentdisclosure for modifying a target nucleic acid comprises contacting thetarget nucleic acid with: a) a CasZ polypeptide of the presentdisclosure; b) a CasZ guide RNA; and c) a donor nucleic acid (e.g, adonor template). In some cases, a method of the present disclosure formodifying a target nucleic acid comprises contacting the target nucleicacid with: a) a CasZ polypeptide; b) a CasZ guide RNA; c) a CasZtrancRNA, and d) a donor nucleic acid (e.g, a donor template). In somecases, the contacting step is carried out in a cell in vitro. In somecases, the contacting step is carried out in a cell in vivo. In somecases, the contacting step is carried out in a cell ex vivo.

Because a method that uses a CasZ polypeptide includes binding of theCasZ polypeptide to a particular region in a target nucleic acid (byvirtue of being targeted there by an associated CasZ guide RNA), themethods are generally referred to herein as methods of binding (e.g., amethod of binding a target nucleic acid). However, it is to beunderstood that in some cases, while a method of binding may result innothing more than binding of the target nucleic acid, in other cases,the method can have different final results (e.g., the method can resultin modification of the target nucleic acid, e.g.,cleavage/methylation/etc., modulation of transcription from the targetnucleic acid; modulation of translation of the target nucleic acid;genome editing; modulation of a protein associated with the targetnucleic acid; isolation of the target nucleic acid; etc.).

For examples of suitable methods (e.g., that are used with CRISPR/Cas9systems), see, for example, Jinek et al., Science. 2012 Aug. 17;337(6096):816-21; Chylinski et al., RNA Biol. 2013 May; 10(5):726-37; Maet al., Biomed Res Int. 2013; 2013:270805; Hou et al., Proc Natl AcadSci USA. 2013 Sep. 24; 110(39):15644-9; Jinek et al., Elife. 2013;2:e00471; Pattanayak et al., Nat Biotechnol. 2013 September;31(9):839-43; Qi et al, Cell. 2013 Feb. 28; 152(5):1173-83; Wang et al.,Cell. 2013 May 9; 153(4):910-8; Auer et al., Genome Res. 2013 Oct. 31;Chen et al., Nucleic Acids Res. 2013 Nov. 1; 41(20):e19; Cheng et al.,Cell Res. 2013 October; 23(10):1163-71; Cho et al., Genetics. 2013November; 195(3):1177-80; DiCarlo et al., Nucleic Acids Res. 2013 April;41(7):4336-43; Dickinson et al., Nat Methods. 2013 October;10(10):1028-34; Ebina et al., Sci Rep. 2013; 3:2510; Fujii et al,Nucleic Acids Res. 2013 Nov. 1; 41(20):e187; Hu et al., Cell Res. 2013November; 23(11):1322-5; Jiang et al., Nucleic Acids Res. 2013 Nov. 1;41(20):e188; Larson et al., Nat Protoc. 2013 November; 8(11):2180-96;Mali et. at., Nat Methods. 2013 October; 10(10):957-63; Nakayama et al.,Genesis. 2013 December; 51(12):835-43; Ran et al., Nat Protoc. 2013November; 8(11):2281-308; Ran et al., Cell. 2013 Sep. 12; 154(6):1380-9;Upadhyay et al., G3 (Bethesda). 2013 Dec. 9; 3(12):2233-8; Walsh et al.,Proc Natl Acad Sci USA. 2013 Sep. 24; 110(39):15514-5; Xie et al., MolPlant. 2013 Oct. 9; Yang et al., Cell. 2013 Sep. 12; 154(6):1370-9; andU.S. patents and patent applications: U.S. Pat. Nos. 8,906,616;8,895,308; 8,889,418; 8,889,356; 8,871,445; 8,865,406; 8,795,965;8,771,945; 8,697,359; 20140068797; 20140170753; 20140179006;20140179770; 20140186843; 20140186919; 20140186958; 20140189896;20140227787; 20140234972; 20140242664; 20140242699; 20140242700;20140242702; 20140248702; 20140256046; 20140273037; 20140273226;20140273230; 20140273231; 20140273232; 20140273233; 20140273234;20140273235; 20140287938; 20140295556; 20140295557; 20140298547;20140304853; 20140309487; 20140310828; 20140310830; 20140315985;20140335063; 20140335620; 20140342456; 20140342457; 20140342458;20140349400; 20140349405; 20140356867; 20140356956; 20140356958;20140356959; 20140357523; 20140357530; 20140364333; and 20140377868;each of which is hereby incorporated by reference in its entirety.

For example, the present disclosure provides (but is not limited to)methods of cleaving a target nucleic acid; methods of editing a targetnucleic acid; methods of modulating transcription from a target nucleicacid; methods of isolating a target nucleic acid, methods of binding atarget nucleic acid, methods of imaging a target nucleic acid, methodsof modifying a target nucleic acid, and the like.

As used herein, the terms/phrases “contact a target nucleic acid” and“contacting a target nucleic acid”, for example, with a CasZ polypeptideor with a CasZ fusion polypeptide, etc., encompass all methods forcontacting the target nucleic acid. For example, a CasZ polypeptide canbe provided to a cell as protein, RNA (encoding the CasZ polypeptide),or DNA (encoding the CasZ polypeptide); while a CasZ guide RNA can beprovided as a guide RNA or as a nucleic acid encoding the guide RNA anda CasZ trancRNA can be provided as a trancRNA or as a nucleic acidencoding the trancRNA. As such, when, for example, performing a methodin a cell (e.g., inside of a cell in vitro, inside of a cell in vivo,inside of a cell ex vivo), a method that includes contacting the targetnucleic acid encompasses the introduction into the cell of any or all ofthe components in their active/final state (e.g., in the form of aprotein(s) for CasZ polypeptide; in the form of a protein for a CasZfusion polypeptide; in the form of an RNA in some cases for the guideRNA), and also encompasses the introduction into the cell of one or morenucleic acids encoding one or more of the components (e.g., nucleicacid(s) comprising nucleotide sequence(s) encoding a CasZ polypeptide ora CasZ fusion polypeptide, nucleic acid(s) comprising nucleotidesequence(s) encoding guide RNA(s), nucleic acid comprising a nucleotidesequence encoding a donor template, and the like). Because the methodscan also be performed in vitro outside of a cell, a method that includescontacting a target nucleic acid, (unless otherwise specified)encompasses contacting outside of a cell in vitro, inside of a cell invitro, inside of a cell in vivo, inside of a cell ex vivo, etc.

In some cases, a method of the present disclosure for modifying a targetnucleic acid comprises introducing into a target cell a CasZ locus,e.g., a nucleic acid comprising a nucleotide sequence encoding a CasZpolypeptide as well as nucleotide sequences of about 1 kilobase (kb) to5 kb in length surrounding the CasZ-encoding nucleotide sequence from acell (e.g., in some cases a cell that in its natural state (the state inwhich it occurs in nature) comprises a CasZ locus) comprising a CasZlocus, where the target cell does not normally (in its natural state)comprise a CasZ locus (e.g., in some cases the locus includes a CasZtrancRNA. However, one or more spacer sequences, encoding guidesequences for the encoded crRNA(s), can be modified such that one ormore target sequences of interest are targeted. Thus, for example, insome cases, a method of the present disclosure for modifying a targetnucleic acid comprises introducing into a target cell a CasZ locus,e.g., a nucleic acid obtained from a source cell (e.g., in some cases acell that in its natural state (the state in which it occurs in nature)comprises a CasZ locus), where the nucleic acid has a length of from 100nucleotides (nt) to 5 kb in length (e.g., from 100 nt to 500 nt, from500 nt to 1 kb, from 1 kb to 1.5 kb, from 1.5 kb to 2 kb, from 2 kb to2.5 kb, from 2.5 kb to 3 kb, from 3 kb to 3.5 kb, from 3.5 kb to 4 kb,or from 4 kb to 5 kb in length) and comprises a nucleotide sequenceencoding a CasZ polypeptide. As noted above, in some such cases, one ormore spacer sequences, encoding guide sequences for the encodedcrRNA(s), can be modified such that one or more target sequences ofinterest are targeted. In some cases, the method comprises introducinginto a target cell: i) a CasZ locus; and ii) a donor DNA template. Insome cases, the target nucleic acid is in a cell-free composition invitro. In some cases, the target nucleic acid is present in a targetcell. In some cases, the target nucleic acid is present in a targetcell, where the target cell is a prokaryotic cell. In some cases, thetarget nucleic acid is present in a target cell, where the target cellis a eukaryotic cell. In some cases, the target nucleic acid is presentin a target cell, where the target cell is a mammalian cell. In somecases, the target nucleic acid is present in a target cell, where thetarget cell is a plant cell.

In some cases, a method of the present disclosure for modifying a targetnucleic acid comprises contacting a target nucleic acid with a CasZpolypeptide of the present disclosure, or with a CasZ fusion polypeptideof the present disclosure. In some cases, abmethod of the presentdisclosure for modifying a target nucleic acid comprises contacting atarget nucleic acid with a CasZ polypeptide and a CasZ guide RNA. Insome cases, abmethod of the present disclosure for modifying a targetnucleic acid comprises contacting a target nucleic acid with a CasZpolypeptide, a CasZ guide RNA, and a CasZ trancRNA. In some cases,abmethod of the present disclosure for modifying a target nucleic acidcomprises contacting a target nucleic acid with a CasZ polypeptide, afirst CasZ guide RNA, and a second CasZ guide RNA (and in some cases aCasZ trancRNA). In some cases, a method of the present disclosure formodifying a target nucleic acid comprises contacting a target nucleicacid with a CasZ polypeptide of the present disclosure and a CasZ guideRNA and a donor DNA template. In some cases, a method of the presentdisclosure for modifying a target nucleic acid comprises contacting atarget nucleic acid with a CasZ polypeptide of the present disclosureand a CasZ guide RNA and a CasZ trancRNA and a donor DNA template.

In some cases, the target nucleic acid is in a cell-free composition invitro. In some cases, the target nucleic acid is present in a targetcell. In some cases, the target nucleic acid is present in a targetcell, where the target cell is a prokaryotic cell. In some cases, thetarget nucleic acid is present in a target cell, where the target cellis a eukaryotic cell. In some cases, the target nucleic acid is presentin a target cell, where the target cell is a mammalian cell. In somecases, the target nucleic acid is present in a target cell, where thetarget cell is a plant cell.

Target Nucleic Acids and Target Cells of Interest

A target nucleic acid can be any nucleic acid (e.g., DNA, RNA), can bedouble stranded or single stranded, can be any type of nucleic acid(e.g., a chromosome (genomic DNA), derived from a chromosome,chromosomal DNA, plasmid, viral, extracellular, intracellular,mitochondrial, chloroplast, linear, circular, etc.) and can be from anyorganism (e.g., as long as the CasZ guide RNA comprises a nucleotidesequence that hybridizes to a target sequence in a target nucleic acid,such that the target nucleic acid can be targeted).

A target nucleic acid can be DNA or RNA. A target nucleic acid can bedouble stranded (e.g., dsDNA, dsRNA) or single stranded (e.g., ssRNA,ssDNA). In some cases, a target nucleic acid is single stranded. In somecases, a target nucleic acid is a single stranded RNA (ssRNA). In somecases, a target ssRNA (e.g., a target cell ssRNA, a viral ssRNA, etc.)is selected from: mRNA, rRNA, tRNA, non-coding RNA (ncRNA), longnon-coding RNA (IncRNA), and microRNA (miRNA). In some cases, a targetnucleic acid is a single stranded DNA (ssDNA) (e.g., a viral DNA). Asnoted above, in some cases, a target nucleic acid is single stranded.

A target nucleic acid can be located anywhere, for example, outside of acell in vitro, inside of a cell in vitro, inside of a cell in vivo,inside of a cell ex vivo. Suitable target cells (which can comprisetarget nucleic acids such as genomic DNA) include, but are not limitedto: a bacterial cell; an archaeal cell; a cell of a single-celleukaryotic organism; a plant cell; an algal cell, e.g., Botryococcusbraunii, Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorellapyrenoidosa, Sargassum patens, C. agardh, and the like; a fungal cell(e.g., a yeast cell); an animal cell; a cell from an invertebrate animal(e.g. fruit fly, a cnidarian, an echinoderm, a nematode, etc.); a cellof an insect (e.g., a mosquito; a bee; an agricultural pest; etc.); acell of an arachnid (e.g., a spider; a tick; etc.); a cell from avertebrate animal (e.g., a fish, an amphibian, a reptile, a bird, amammal); a cell from a mammal (e.g., a cell from a rodent; a cell from ahuman; a cell of a non-human mammal; a cell of a rodent (e.g., a mouse,a rat); a cell of a lagomorph (e.g., a rabbit); a cell of an ungulate(e.g., a cow, a horse, a camel, a llama, a vicufla, a sheep, a goat,etc.); a cell of a marine mammal (e.g., a whale, a seal, an elephantseal, a dolphin, a sea lion; etc.) and the like. Any type of cell may beof interest (e.g. a stem cell, e.g. an embryonic stem (ES) cell, aninduced pluripotent stem (iPS) cell, a germ cell (e.g., an oocyte, asperm, an oogonia, a spermatogonia, etc.), an adult stem cell, a somaticcell, e.g. a fibroblast, a hematopoietic cell, a neuron, a muscle cell,a bone cell, a hepatocyte, a pancreatic cell; an in vitro or in vivoembryonic cell of an embryo at any stage, e.g., a 1-cell, 2-cell,4-cell, 8-cell, etc. stage zebrafish embryo; etc.).

Cells may be from established cell lines or they may be primary cells,where “primary cells”, “primary cell lines”, and “primary cultures” areused interchangeably herein to refer to cells and cells cultures thathave been derived from a subject and allowed to grow in vitro for alimited number of passages, i.e. splittings, of the culture. Forexample, primary cultures are cultures that may have been passaged 0times, 1 time, 2 times, 4 times, 5 times, 10 times, or 15 times, but notenough times go through the crisis stage. Typically, the primary celllines are maintained for fewer than 10 passages in vitro. Target cellscan be unicellular organisms and/or can be grown in culture. If thecells are primary cells, they may be harvest from an individual by anyconvenient method. For example, leukocytes may be conveniently harvestedby apheresis, leukocytapheresis, density gradient separation, etc.,while cells from tissues such as skin, muscle, bone marrow, spleen,liver, pancreas, lung, intestine, stomach, etc. can be convenientlyharvested by biopsy.

In some of the above applications, the subject methods may be employedto induce target nucleic acid cleavage, target nucleic acidmodification, and/or to bind target nucleic acids (e.g., forvisualization, for collecting and/or analyzing, etc.) in mitotic orpost-mitotic cells in vivo and/or ex vivo and/or in vitro (e.g., todisrupt production of a protein encoded by a targeted mRNA, to cleave orotherwise modify target DNA, to genetically modify a target cell, andthe like). Because the guide RNA provides specificity by hybridizing totarget nucleic acid, a mitotic and/or post-mitotic cell of interest inthe disclosed methods may include a cell from any organism (e.g. abacterial cell, an archaeal cell, a cell of a single-cell eukaryoticorganism, a plant cell, an algal cell, e.g., Botryococcus braunii,Chlamydomonas reinhardtii, Nannochloropsis gaditana, Chlorellapyrenoidosa, Sargassum patens, C. agardh, and the like, a fungal cell(e.g., a yeast cell), an animal cell, a cell from an invertebrate animal(e.g. fruit fly, cnidarian, echinoderm, nematode, etc.), a cell from avertebrate animal (e.g., fish, amphibian, reptile, bird, mammal), a cellfrom a mammal, a cell from a rodent, a cell from a human, etc.). In somecases, a subject CasZ protein (and/or nucleic acid encoding the proteinsuch as DNA and/or RNA), and/or CasZ guide RNA (and/or a DNA encodingthe guide RNA), and/or donor template, and/or RNP can be introduced intoan individual (i.e., the target cell can be in vivo) (e.g., a mammal, arat, a mouse, a pig, a primate, a non-human primate, a human, etc.). Insome cases, such an administration can be for the purpose of treatingand/or preventing a disease, e.g., by editing the genome of targetedcells.

Plant cells include cells of a monocotyledon, and cells of adicotyledon. The cells can be root cells, leaf cells, cells of thexylem, cells of the phloem, cells of the cambium, apical meristem cells,parenchyma cells, collenchyma cells, sclerenchyma cells, and the like.Plant cells include cells of agricultural crops such as wheat, corn,rice, sorghum, millet, soybean, etc. Plant cells include cells ofagricultural fruit and nut plants, e.g., plant that produce apricots,oranges, lemons, apples, plums, pears, almonds, etc.

Additional examples of target cells are listed above in the sectiontitled “Modified cells.” Non-limiting examples of cells (target cells)include: a prokaryotic cell, eukaryotic cell, a bacterial cell, anarchaeal cell, a cell of a single-cell eukaryotic organism, a protozoacell, a cell from a plant (e.g., cells from plant crops, fruits,vegetables, grains, soy bean, corn, maize, wheat, seeds, tomatos, rice,cassava, sugarcane, pumpkin, hay, potatos, cotton, cannabis, tobacco,flowering plants, conifers, gymnosperms, angiosperms, ferns, clubmosses,hornworts, liverworts, mosses, dicotyledons, monocotyledons, etc.), analgal cell, (e.g., Botryococcus braunii, Chlamydomonas reinhardtii,Nannochloropsis gaditana, Chlorella pyrenoidosa, Sargassum patens, C.agardh, and the like), seaweeds (e.g. kelp) a fungal cell (e.g., a yeastcell, a cell from a mushroom), an animal cell, a cell from aninvertebrate animal (e.g., fruit fly, cnidarian, echinoderm, nematode,etc.), a cell from a vertebrate animal (e.g., fish, amphibian, reptile,bird, mammal), a cell from a mammal (e.g., an ungulate (e.g., a pig, acow, a goat, a sheep); a rodent (e.g., a rat, a mouse); a non-humanprimate; a human; a feline (e.g., a cat); a canine (e.g., a dog); etc.),and the like. In some cases, the cell is a cell that does not originatefrom a natural organism (e.g., the cell can be a synthetically madecell; also referred to as an artificial cell).

A cell can be an in vitro cell (e.g., established cultured cell line). Acell can be an ex vivo cell (cultured cell from an individual). A cellcan be and in vivo cell (e.g., a cell in an individual). A cell can bean isolated cell. A cell can be a cell inside of an organism. A cell canbe an organism. A cell can be a cell in a cell culture (e.g., in vitrocell culture). A cell can be one of a collection of cells. A cell can bea prokaryotic cell or derived from a prokaryotic cell. A cell can be abacterial cell or can be derived from a bacterial cell. A cell can be anarchaeal cell or derived from an archaeal cell. A cell can be aeukaryotic cell or derived from a eukaryotic cell. A cell can be a plantcell or derived from a plant cell. A cell can be an animal cell orderived from an animal cell. A cell can be an invertebrate cell orderived from an invertebrate cell. A cell can be a vertebrate cell orderived from a vertebrate cell. A cell can be a mammalian cell orderived from a mammalian cell. A cell can be a rodent cell or derivedfrom a rodent cell. A cell can be a human cell or derived from a humancell. A cell can be a microbe cell or derived from a microbe cell. Acell can be a fungi cell or derived from a fungi cell. A cell can be aninsect cell. A cell can be an arthropod cell. A cell can be a protozoancell. A cell can be a helminth cell.

Suitable cells include a stem cell (e.g. an embryonic stem (ES) cell, aninduced pluripotent stem (iPS) cell; a germ cell (e.g., an oocyte, asperm, an oogonia, a spermatogonia, etc.); a somatic cell, e.g. afibroblast, an oligodendrocyte, a glial cell, a hematopoietic cell, aneuron, a muscle cell, a bone cell, a hepatocyte, a pancreatic cell,etc.

Suitable cells include human embryonic stem cells, fetal cardiomyocytes,myofibroblasts, mesenchymal stem cells, autotransplated expandedcardiomyocytes, adipocytes, totipotent cells, pluripotent cells, bloodstem cells, myoblasts, adult stem cells, bone marrow cells, mesenchymalcells, embryonic stem cells, parenchymal cells, epithelial cells,endothelial cells, mesothelial cells, fibroblasts, osteoblasts,chondrocytes, exogenous cells, endogenous cells, stem cells,hematopoietic stem cells, bone-marrow derived progenitor cells,myocardial cells, skeletal cells, fetal cells, undifferentiated cells,multi-potent progenitor cells, unipotent progenitor cells, monocytes,cardiac myoblasts, skeletal myoblasts, macrophages, capillaryendothelial cells, xenogenic cells, allogenic cells, and post-natal stemcells.

In some cases, the cell is an immune cell, a neuron, an epithelial cell,and endothelial cell, or a stem cell. In some cases, the immune cell isa T cell, a B cell, a monocyte, a natural killer cell, a dendritic cell,or a macrophage. In some cases, the immune cell is a cytotoxic T cell.In some cases, the immune cell is a helper T cell. In some cases, theimmune cell is a regulatory T cell (Treg).

In some cases, the cell is a stem cell. Stem cells include adult stemcells. Adult stem cells are also referred to as somatic stem cells.

Adult stem cells are resident in differentiated tissue, but retain theproperties of self-renewal and ability to give rise to multiple celltypes, usually cell types typical of the tissue in which the stem cellsare found. Numerous examples of somatic stem cells are known to those ofskill in the art, including muscle stem cells; hematopoietic stem cells;epithelial stem cells; neural stem cells; mesenchymal stem cells;mammary stem cells; intestinal stem cells; mesodermal stem cells;endothelial stem cells; olfactory stem cells; neural crest stem cells;and the like.

Stem cells of interest include mammalian stem cells, where the term“mammalian” refers to any animal classified as a mammal, includinghumans; non-human primates; domestic and farm animals; and zoo,laboratory, sports, or pet animals, such as dogs, horses, cats, cows,mice, rats, rabbits, etc. In some cases, the stem cell is a human stemcell. In some cases, the stem cell is a rodent (e.g., a mouse; a rat)stem cell. In some cases, the stem cell is a non-human primate stemcell.

Stem cells can express one or more stem cell markers, e.g., SOX9, KRT19,KRT7, LGR5, CA9, FXYD2, CDH6, CLDN18, TSPAN8, BPIFB 1, OLFM4, CDH17, andPPARGC1A.

In some cases, the stem cell is a hematopoietic stem cell (HSC). HSCsare mesoderm-derived cells that can be isolated from bone marrow, blood,cord blood, fetal liver and yolk sac. HSCs are characterized as CD34⁺and CD3⁻. HSCs can repopulate the erythroid, neutrophil-macrophage,megakaryocyte and lymphoid hematopoietic cell lineages in vivo. Invitro, HSCs can be induced to undergo at least some self-renewing celldivisions and can be induced to differentiate to the same lineages as isseen in vivo. As such, HSCs can be induced to differentiate into one ormore of erythroid cells, megakaryocytes, neutrophils, macrophages, andlymphoid cells.

In other cases, the stem cell is a neural stem cell (NSC). Neural stemcells (NSCs) are capable of differentiating into neurons, and glia(including oligodendrocytes, and astrocytes). A neural stem cell is amultipotent stem cell which is capable of multiple divisions, and underspecific conditions can produce daughter cells which are neural stemcells, or neural progenitor cells that can be neuroblasts or glioblasts,e.g., cells committed to become one or more types of neurons and glialcells respectively. Methods of obtaining NSCs are known in the art.

In other cases, the stem cell is a mesenchymal stem cell (MSC). MSCsoriginally derived from the embryonal mesoderm and isolated from adultbone marrow, can differentiate to form muscle, bone, cartilage, fat,marrow stroma, and tendon. Methods of isolating MSC are known in theart; and any known method can be used to obtain MSC. See, e.g., U.S.Pat. No. 5,736,396, which describes isolation of human MSC.

A cell is in some cases a plant cell. A plant cell can be a cell of amonocotyledon. A cell can be a cell of a dicotyledon.

In some cases, the cell is a plant cell. For example, the cell can be acell of a major agricultural plant, e.g., Barley, Beans (Dry Edible),Canola, Corn, Cotton (Pima), Cotton (Upland), Flaxseed, Hay (Alfalfa),Hay (Non-Alfalfa), Oats, Peanuts, Rice, Sorghum, Soybeans, Sugarbeets,Sugarcane, Sunflowers (Oil), Sunflowers (Non-Oil), Sweet Potatoes,Tobacco (Burley), Tobacco (Flue-cured), Tomatoes, Wheat (Durum), Wheat(Spring), Wheat (Winter), and the like. As another example, the cell isa cell of a vegetable crops which include but are not limited to, e.g.,alfalfa sprouts, aloe leaves, arrow root, arrowhead, artichokes,asparagus, bamboo shoots, banana flowers, bean sprouts, beans, beettops, beets, bittermelon, bok choy, broccoli, broccoli rabe (rappini),brussels sprouts, cabbage, cabbage sprouts, cactus leaf (nopales),calabaza, cardoon, carrots, cauliflower, celery, chayote, chineseartichoke (crosnes), chinese cabbage, chinese celery, chinese chives,choy sum, chrysanthemum leaves (tung ho), collard greens, corn stalks,corn-sweet, cucumbers, daikon, dandelion greens, dasheen, dau mue (peatips), donqua (winter melon), eggplant, endive, escarole, fiddle headferns, field cress, frisee, gai choy (chinese mustard), gailon, galanga(siam, thai ginger), garlic, ginger root, gobo, greens, hanover saladgreens, huauzontle, jerusalem artichokes, jicama, kale greens, kohlrabi,lamb's quarters (quilete), lettuce (bibb), lettuce (boston), lettuce(boston red), lettuce (green leaf), lettuce (iceberg), lettuce (lollarossa), lettuce (oak leaf—green), lettuce (oak leaf—red), lettuce(processed), lettuce (red leaf), lettuce (romaine), lettuce (rubyromaine), lettuce (russian red mustard), linkok, lo bok, long beans,lotus root, mache, maguey (agave) leaves, malanga, mesculin mix, mizuna,moap (smooth luffa), moo, moqua (fuzzy squash), mushrooms, mustard,nagaimo, okra, ong choy, onions green, opo (long squash), ornamentalcorn, ornamental gourds, parsley, parsnips, peas, peppers (bell type),peppers, pumpkins, radicchio, radish sprouts, radishes, rape greens,rape greens, rhubarb, romaine (baby red), rutabagas, salicornia (seabean), sinqua (angled/ridged luffa), spinach, squash, straw bales,sugarcane, sweet potatoes, swiss chard, tamarindo, taro, taro leaf, taroshoots, tatsoi, tepeguaje (guaje), tindora, tomatillos, tomatoes,tomatoes (cherry), tomatoes (grape type), tomatoes (plum type), tumeric,turnip tops greens, turnips, water chestnuts, yampi, yams (names), yuchoy, yuca (cassava), and the like.

A cell is in some cases an arthropod cell. For example, the cell can bea cell of a sub-order, a family, a sub-family, a group, a sub-group, ora species of, e.g., Chelicerata, Myriapodia, Hexipodia, Arachnida,Insecta, Archaeognatha, Thysanura, Palaeoptera, Ephemeroptera, Odonata,Anisoptera, Zygoptera, Neoptera, Exopterygota, Plecoptera, Embioptera,Orthoptera, Zoraptera, Dermaptera, Dictyoptera, Notoptera,Grylloblattidae, Mantophasmatidae, Phasmatodea, Blattaria, Isoptera,Mantodea, Parapneuroptera, Psocoptera, Thysanoptera, Phthiraptera,Hemiptera, Endopterygota or Holometabola, Hymenoptera, Coleoptera,Strepsiptera, Raphidioptera, Megaloptera, Neuroptera, Mecoptera,Siphonaptera, Diptera, Trichoptera, or Lepidoptera.

A cell is in some cases an insect cell. For example, in some cases, thecell is a cell of a mosquito, a grasshopper, a true bug, a fly, a flea,a bee, a wasp, an ant, a louse, a moth, or a beetle.

Donor Polynucleotide (Donor Template)

Guided by a CasZ guide RNA, a CasZ protein in some cases generatessite-specific double strand breaks (DSBs) or single strand breaks (SSBs)(e.g., when the CasZ protein is a nickase variant) withindouble-stranded DNA (dsDNA) target nucleic acids, which are repairedeither by non-homologous end joining (NHEJ) or homology-directedrecombination (HDR).

In some cases, contacting a target DNA (with a CasZ protein and a CasZguide RNA) occurs under conditions that are permissive for nonhomologousend joining or homology-directed repair. Thus, in some cases, a subjectmethod includes contacting the target DNA with a donor polynucleotide(e.g., by introducing the donor polynucleotide into a cell), wherein thedonor polynucleotide, a portion of the donor polynucleotide, a copy ofthe donor polynucleotide, or a portion of a copy of the donorpolynucleotide integrates into the target DNA. In some cases, the methoddoes not comprise contacting a cell with a donor polynucleotide, and thetarget DNA is modified such that nucleotides within the target DNA aredeleted.

In some cases, a CasZ trancRNA (or nucleic acid encoding same), a CasZguide RNA (or nucleic acid encoding same), and/or a CasZ protein (or anucleic acid encoding same, such as an RNA or a DNA, e.g, one or moreexpression vectors) are coadministered (e.g., contacted with a targetnucleic acid, administered to cells, etc.) with a donor polynucleotidesequence that includes at least a segment with homology to the targetDNA sequence, the subject methods may be used to add, i.e. insert orreplace, nucleic acid material to a target DNA sequence (e.g. to “knockin” a nucleic acid, e.g., one that encodes for a protein, an siRNA, anmiRNA, etc.), to add a tag (e.g., 6×His, a fluorescent protein (e.g., agreen fluorescent protein; a yellow fluorescent protein, etc.),hemagglutinin (HA), FLAG, etc.), to add a regulatory sequence to a gene(e.g. promoter, polyadenylation signal, internal ribosome entry sequence(IRES), 2A peptide, start codon, stop codon, splice signal, localizationsignal, etc.), to modify a nucleic acid sequence (e.g., introduce amutation, remove a disease causing mutation by introducing a correctsequence), and the like. As such, a complex comprising a CasZ guide RNAand CasZ protein (or CasZ guide RNA and CasZ trancRNA and CasZ protein)is useful in any in vitro or in vivo application in which it isdesirable to modify DNA in a site-specific, i.e. “targeted”, way, forexample gene knock-out, gene knock-in, gene editing, gene tagging, etc.,as used in, for example, gene therapy, e.g. to treat a disease or as anantiviral, antipathogenic, or anticancer therapeutic, the production ofgenetically modified organisms in agriculture, the large scaleproduction of proteins by cells for therapeutic, diagnostic, or researchpurposes, the induction of iPS cells, biological research, the targetingof genes of pathogens for deletion or replacement, etc.

In applications in which it is desirable to insert a polynucleotidesequence into he genome where a target sequence is cleaved, a donorpolynucleotide (a nucleic acid comprising a donor sequence) can also beprovided to the cell. By a “donor sequence” or “donor polynucleotide” or“donor template” it is meant a nucleic acid sequence to be inserted atthe site cleaved by the CasZ protein (e.g., after dsDNA cleavage, afternicking a target DNA, after dual nicking a target DNA, and the like).The donor polynucleotide can contain sufficient homology to a genomicsequence at the target site, e.g. 70%, 80%, 85%, 90%, 95%, or 100%homology with the nucleotide sequences flanking the target site, e.g.within about 50 bases or less of the target site, e.g. within about 30bases, within about 15 bases, within about 10 bases, within about 5bases, or immediately flanking the target site, to supporthomology-directed repair between it and the genomic sequence to which itbears homology. Approximately 25, 50, 100, or 200 nucleotides, or morethan 200 nucleotides, of sequence homology between a donor and a genomicsequence (or any integral value between 10 and 200 nucleotides, or more)can support homology-directed repair. Donor polynucleotides can be ofany length, e.g. 10 nucleotides or more, 50 nucleotides or more, 100nucleotides or more, 250 nucleotides or more, 500 nucleotides or more,1000 nucleotides or more, 5000 nucleotides or more, etc.

The donor sequence is typically not identical to the genomic sequencethat it replaces. Rather, the donor sequence may contain at least one ormore single base changes, insertions, deletions, inversions orrearrangements with respect to the genomic sequence, so long assufficient homology is present to support homology-directed repair(e.g., for gene correction, e.g., to convert a disease-causing base pairof a non disease-causing base pair). In some embodiments, the donorsequence comprises a non-homologous sequence flanked by two regions ofhomology, such that homology-directed repair between the target DNAregion and the two flanking sequences results in insertion of thenon-homologous sequence at the target region. Donor sequences may alsocomprise a vector backbone containing sequences that are not homologousto the DNA region of interest and that are not intended for insertioninto the DNA region of interest. Generally, the homologous region(s) ofa donor sequence will have at least 50% sequence identity to a genomicsequence with which recombination is desired. In certain embodiments,60%, 70%, 80%, 90%, 95%, 98%, 99%, or 99.9% sequence identity ispresent. Any value between 1% and 100% sequence identity can be present,depending upon the length of the donor polynucleotide.

The donor sequence may comprise certain sequence differences as comparedto the genomic sequence, e.g. restriction sites, nucleotidepolymorphisms, selectable markers (e.g., drug resistance genes,fluorescent proteins, enzymes etc.), etc., which may be used to assessfor successful insertion of the donor sequence at the cleavage site orin some cases may be used for other purposes (e.g., to signifyexpression at the targeted genomic locus). In some cases, if located ina coding region, such nucleotide sequence differences will not changethe amino acid sequence, or will make silent amino acid changes (i.e.,changes which do not affect the structure or function of the protein).Alternatively, these sequences differences may include flankingrecombination sequences such as FLPs, loxP sequences, or the like, thatcan be activated at a later time for removal of the marker sequence.

In some cases, the donor sequence is provided to the cell assingle-stranded DNA. In some cases, the donor sequence is provided tothe cell as double-stranded DNA. It may be introduced into a cell inlinear or circular form. If introduced in linear form, the ends of thedonor sequence may be protected (e.g., from exonucleolytic degradation)by any convenient method and such methods are known to those of skill inthe art. For example, one or more dideoxynucleotide residues can beadded to the 3′ terminus of a linear molecule and/or self-complementaryoligonucleotides can be ligated to one or both ends. See, for example,Chang et al. (1987) Proc. Natl. Acad Sci USA 84:4959-4963; Nehls et al.(1996) Science 272:886-889. Additional methods for protecting exogenouspolynucleotides from degradation include, but are not limited to,addition of terminal amino group(s) and the use of modifiedinternucleotide linkages such as, for example, phosphorothioates,phosphoramidates, and O-methyl ribose or deoxyribose residues. As analternative to protecting the termini of a linear donor sequence,additional lengths of sequence may be included outside of the regions ofhomology that can be degraded without impacting recombination. A donorsequence can be introduced into a cell as part of a vector moleculehaving additional sequences such as, for example, replication origins,promoters and genes encoding antibiotic resistance. Moreover, donorsequences can be introduced as naked nucleic acid, as nucleic acidcomplexed with an agent such as a liposome or poloxamer, or can bedelivered by viruses (e.g., adenovirus, AAV), as described elsewhereherein for nucleic acids encoding a CasZ guide RNA and/or a CasZ fusionpolypeptide and/or donor polynucleotide.

Transgenic, Non-Human Organisms

As described above, in some cases, a nucleic acid (e.g., a recombinantexpression vector) of the present disclosure (e.g., a nucleic acidcomprising a nucleotide sequence encoding a CasZ polypeptide; a nucleicacid comprising a nucleotide sequence encoding a CasZ fusionpolypeptide; etc.), is used as a transgene to generate a transgenicnon-human organism that produces a CasZ polypeptide, or a CasZ fusionpolypeptide, of the present disclosure. The present disclosure providesa transgenic-non-human organism comprising a nucleotide sequenceencoding a CasZ polypeptide, or a CasZ fusion polypeptide, of thepresent disclosure.

Transgenic, Non-Human Animals

The present disclosure provides a transgenic non-human animal, whichanimal comprises a transgene comprising a nucleic acid comprising anucleotide sequence encoding a CasZ polypeptide or a CasZ fusionpolypeptide. In some embodiments, the genome of the transgenic non-humananimal comprises a nucleotide sequence encoding a CasZ polypeptide, or aCasZ fusion polypeptide, of the present disclosure. In some cases, thetransgenic non-human animal is homozygous for the genetic modification.In some cases, the transgenic non-human animal is heterozygous for thegenetic modification. In some embodiments, the transgenic non-humananimal is a vertebrate, for example, a fish (e.g., salmon, trout, zebrafish, gold fish, puffer fish, cave fish, etc.), an amphibian (frog,newt, salamander, etc.), a bird (e.g., chicken, turkey, etc.), a reptile(e.g., snake, lizard, etc.), a non-human mammal (e.g., an ungulate,e.g., a pig, a cow, a goat, a sheep, etc.; a lagomorph (e.g., a rabbit);a rodent (e.g., a rat, a mouse); a non-human primate; etc.), etc. Insome cases, the transgenic non-human animal is an invertebrate. In somecases, the transgenic non-human animal is an insect (e.g., a mosquito;an agricultural pest; etc.). In some cases, the transgenic non-humananimal is an arachnid.

Nucleotide sequences encoding a CasZ polypeptide or a CasZ fusionpolypeptide, of the present disclosure can be under the control of(i.e., operably linked to) an unknown promoter (e.g., when the nucleicacid randomly integrates into a host cell genome) or can be under thecontrol of (i.e., operably linked to) a known promoter. Suitable knownpromoters can be any known promoter and include constitutively activepromoters (e.g., CMV promoter), inducible promoters (e.g., heat shockpromoter, tetracycline-regulated promoter, steroid-regulated promoter,metal-regulated promoter, estrogen receptor-regulated promoter, etc.),spatially restricted and/or temporally restricted promoters (e.g., atissue specific promoter, a cell type specific promoter, etc.), etc.

Transgenic Plants

As described above, in some cases, a nucleic acid (e.g., a recombinantexpression vector) of the present disclosure (e.g., a nucleic acidcomprising a nucleotide sequence encoding a CasZ polypeptide of thepresent disclosure; a nucleic acid comprising a nucleotide sequenceencoding a CasZ fusion polypeptide of the present disclosure; etc.), isused as a transgene to generate a transgenic plant that produces a CasZpolypeptide, or a CasZ fusion polypeptide, of the present disclosure.The present disclosure provides a transgenic plant comprising anucleotide sequence encoding a CasZ polypeptide, or a CasZ fusionpolypeptide, of the present disclosure. In some embodiments, the genomeof the transgenic plant comprises a subject nucleic acid. In someembodiments, the transgenic plant is homozygous for the geneticmodification. In some embodiments, the transgenic plant is heterozygousfor the genetic modification.

Methods of introducing exogenous nucleic acids into plant cells are wellknown in the art. Such plant cells are considered “transformed,” asdefined above. Suitable methods include viral infection (such as doublestranded DNA viruses), transfection, conjugation, protoplast fusion,electroporation, particle gun technology, calcium phosphateprecipitation, direct microinjection, silicon carbide whiskerstechnology, Agrobacterium-mediated transformation and the like. Thechoice of method is generally dependent on the type of cell beingtransformed and the circumstances under which the transformation istaking place (i.e. in vitro, ex vivo, or in vivo).

Transformation methods based upon the soil bacterium Agrobacteriumtumefaciens are particularly useful for introducing an exogenous nucleicacid molecule into a vascular plant. The wild type form of Agrobacteriumcontains a Ti (tumor-inducing) plasmid that directs production oftumorigenic crown gall growth on host plants. Transfer of thetumor-inducing T-DNA region of the Ti plasmid to a plant genome requiresthe Ti plasmid-encoded virulence genes as well as T-DNA borders, whichare a set of direct DNA repeats that delineate the region to betransferred. An Agrobacterium-based vector is a modified form of a Tiplasmid, in which the tumor inducing functions are replaced by thenucleic acid sequence of interest to be introduced into the plant host.

Agrobacterium-mediated transformation generally employs cointegratevectors or binary vector systems, in which the components of the Tiplasmid are divided between a helper vector, which resides permanentlyin the Agrobacterium host and carries the virulence genes, and a shuttlevector, which contains the gene of interest bounded by T-DNA sequences.A variety of binary vectors is well known in the art and arecommercially available, for example, from Clontech (Palo Alto, Calif.).Methods of coculturing Agrobacterium with cultured plant cells orwounded tissue such as leaf tissue, root explants, hypocotyledons, stempieces or tubers, for example, also are well known in the art. See,e.g., Glick and Thompson, (eds.), Methods in Plant Molecular Biology andBiotechnology, Boca Raton, Fla.: CRC Press (1993).

Microprojectile-mediated transformation also can be used to produce asubject transgenic plant. This method, first described by Klein et al.(Nature 327:70-73 (1987)), relies on microprojectiles such as gold ortungsten that are coated with the desired nucleic acid molecule byprecipitation with calcium chloride, spermidine or polyethylene glycol.The microprojectile particles are accelerated at high speed into anangiosperm tissue using a device such as the BIOLISTIC PD-1000 (Biorad;Hercules Calif.).

A nucleic acid of the present disclosure (e.g., a nucleic acid (e.g., arecombinant expression vector) comprising a nucleotide sequence encodinga CasZ polypeptide, or a CasZ fusion polypeptide, of the presentdisclosure) may be introduced into a plant in a manner such that thenucleic acid is able to enter a plant cell(s), e.g., via an in vivo orex vivo protocol. By “in vivo,” it is meant in the nucleic acid isadministered to a living body of a plant e.g. infiltration. By “ex vivo”it is meant that cells or explants are modified outside of the plant,and then such cells or organs are regenerated to a plant. A number ofvectors suitable for stable transformation of plant cells or for theestablishment of transgenic plants have been described, including thosedescribed in Weissbach and Weissbach, (1989) Methods for Plant MolecularBiology Academic Press, and Gelvin et al., (1990) Plant MolecularBiology Manual, Kluwer Academic Publishers. Specific examples includethose derived from a Ti plasmid of Agrobacterium tumefaciens, as well asthose disclosed by Herrera-Estrella et al. (1983) Nature 303: 209, Bevan(1984) Nucl Acid Res. 12: 8711-8721, Klee (1985) Bio/Technolo 3:637-642. Alternatively, non-Ti vectors can be used to transfer the DNAinto plants and cells by using free DNA delivery techniques. By usingthese methods transgenic plants such as wheat, rice (Christou (1991)Bio/Technology 9:957-9 and 4462) and corn (Gordon-Kamm (1990) Plant Cell2: 603-618) can be produced. An immature embryo can also be a goodtarget tissue for monocots for direct DNA delivery techniques by usingthe particle gun (Weeks et al. (1993) Plant Physiol 102: 1077-1084;Vasil (1993) Bio/Technolo 10: 667-674; Wan and Lemeaux (1994) PlantPhysiol 104: 37-48 and for Agrobacterium-mediated DNA transfer (Ishidaet al. (1996) Nature Biotech 14: 745-750). Exemplary methods forintroduction of DNA into chloroplasts are biolistic bombardment,polyethylene glycol transformation of protoplasts, and microinjection(Danieli et al Nat. Biotechnol 16:345-348, 1998; Staub et al Nat.Biotechnol 18: 333-338, 2000; O'Neill et al Plant J. 3:729-738, 1993;Knoblauch et al Nat. Biotechnol 17: 906-909; U.S. Pat. Nos. 5,451,513,5,545,817, 5,545,818, and 5,576,198; in Intl. Application No. WO95/16783; and in Boynton et al., Methods in Enzymology 217: 510-536(1993), Svab et al., Proc. Natl. Acad. Sci. USA 90: 913-917 (1993), andMcBride et al., Proc. Natl. Acad. Sci. USA 91: 7301-7305 (1994)). Anyvector suitable for the methods of biolistic bombardment, polyethyleneglycol transformation of protoplasts and microinjection will be suitableas a targeting vector for chloroplast transformation. Any doublestranded DNA vector may be used as a transformation vector, especiallywhen the method of introduction does not utilize Agrobacterium.

Plants which can be genetically modified include grains, forage crops,fruits, vegetables, oil seed crops, palms, forestry, and vines. Specificexamples of plants which can be modified follow: maize, banana, peanut,field peas, sunflower, tomato, canola, tobacco, wheat, barley, oats,potato, soybeans, cotton, carnations, sorghum, lupin and rice.

The present disclosure provides transformed plant cells, tissues, plantsand products that contain the transformed plant cells. A feature of thesubject transformed cells, and tissues and products that include thesame is the presence of a subject nucleic acid integrated into thegenome, and production by plant cells of a CasZ polypeptide, or a CasZfusion polypeptide, of the present disclosure. Recombinant plant cellsof the present invention are useful as populations of recombinant cells,or as a tissue, seed, whole plant, stem, fruit, leaf, root, flower,stem, tuber, grain, animal feed, a field of plants, and the like.

Nucleotide sequences encoding a CasZ polypeptide, or a CasZ fusionpolypeptide, of the present disclosure can be under the control of(i.e., operably linked to) an unknown promoter (e.g., when the nucleicacid randomly integrates into a host cell genome) or can be under thecontrol of (i.e., operably linked to) a known promoter. Suitable knownpromoters can be any known promoter and include constitutively activepromoters, inducible promoters, spatially restricted and/or temporallyrestricted promoters, etc.

EXAMPLES OF NON-LIMITING ASPECTS OF THE DISCLOSURE

Aspects, including embodiments, of the present subject matter describedabove may be beneficial alone or in combination, with one or more otheraspects or embodiments. Without limiting the foregoing description,certain non-limiting aspects of the disclosure, numbered 1-36 areprovided below. As will be apparent to those of skill in the art uponreading this disclosure, each of the individually numbered aspects maybe used or combined with any of the preceding or following individuallynumbered aspects. This is intended to provide support for all suchcombinations of aspects and is not limited to combinations of aspectsexplicitly provided below:

Aspects

Aspect 1. A method of guiding a CasZ polypeptide to a target sequence ofa target nucleic acid, the method comprising contacting the targetnucleic acid with an engineered and/or non-naturally occurring complexcomprising: (a) a CasZ polypeptide; and (b) a CasZ guide RNA thatcomprises a guide sequence that hybridizes to a target sequence of thetarget nucleic acid, and comprises a region that binds to the CasZpolypeptide.

Aspect 2. The method of aspect 1, wherein the method results inmodification of the target nucleic acid, modulation of transcriptionfrom the target nucleic acid, or modification of a polypeptideassociated with a target nucleic acid.

Aspect 3. The method of aspect 2, wherein the target nucleic acid ismodified by being cleaved.

Aspect 4. The method of any one of aspects 1-3, wherein the targetnucleic acid is selected from: double stranded DNA, single stranded DNA,RNA, genomic DNA, and extrachromosomal DNA.

Aspect 5. The method of any one of aspects 1-4, wherein the guidesequence and the region that binds to the CasZ polypeptide areheterologous to one another.

Aspect 6. The method of any one of aspects 1-5, wherein said contactingresults in genome editing.

Aspect 7. The method of any one of aspects 1-5, wherein said contactingtakes place outside of a bacterial cell and outside of an archaeal cell.

Aspect 8. The method of any one of aspects 1-5, wherein said contactingtakes place in vitro outside of a cell.

Aspect 9. The method of any one of aspects 1-7, wherein said contactingtakes place inside of a target cell.

Aspect 10. The method of aspect 9, wherein said contacting comprises:introducing into the target cell at least one of: (a) the CasZpolypeptide, or a nucleic acid encoding the CasZ polypeptide; and (b)the CasZ guide RNA, or a nucleic acid encoding the CasZ guide RNA.

Aspect 11. The method of aspect 10, wherein the nucleic acid encodingthe CasZ polypeptide is a non-naturally sequence that is codon optimizedfor expression in the target cell.

Aspect 12. The method of any one of aspects 9-11, wherein the targetcell is a eukaryotic cell.

Aspect 13. The method of any one of aspects 9-12, wherein the targetcell is in culture in vitro.

Aspect 14. The method of any one of aspects 9-12, wherein the targetcell is in vivo.

Aspect 15. The method of any one of aspects 9-12, wherein the targetcell is ex vivo.

Aspect 16. The method of aspect 12, wherein the eukaryotic cell isselected from the group consisting of: a plant cell, a fungal cell, asingle cell eukaryotic organism, a mammalian cell, a reptile cell, aninsect cell, an avian cell, a fish cell, a parasite cell, an arthropodcell, an arachnid cell, a cell of an invertebrate, a cell of avertebrate, a rodent cell, a mouse cell, a rat cell, a primate cell, anon-human primate cell, and a human cell.

Aspect 17. The method of any one of aspects 9-16, wherein saidcontacting further comprises: introducing a DNA donor template into thetarget cell.

Aspect 18. The method of any one of aspects 1-17, wherein the methodcomprises contacting the target nucleic acid with a CasZ transactivatingnoncoding RNA (trancRNA).

Aspect 19. The method of any one of aspects 9-17, wherein saidcontacting comprises: introducing a CasZ transactivating noncoding RNA(trancRNA) and/or a nucleic acid encoding the CasZ trancRNA into thetarget cell.

Aspect 20. The method of aspect 18 or aspect 19, wherein the trancRNAcomprises a nucleotide sequence having 70% or more (at least 70%, atleast 75%, at least 80%, at least 85%, at least 90%, at least 95%, atleast 98%, at least 99%, or 100%) nucleotide sequence identity with atrancRNA sequence of Table 2.

Aspect 21. A composition comprising an engineered and/or non-naturallyoccurring complex comprising: (a) a CasZ polypeptide, or a nucleic acidencoding said CasZ polypeptide; and (b) a CasZ guide RNA, or a nucleicacid encoding said CasZ guide RNA, wherein said CasZ guide RNA comprisesa guide sequence that is complementary to a target sequence of a targetnucleic acid, and comprises a region that can bind to the CasZpolypeptide.

Aspect 22. The composition of aspect 21, further comprising a CasZtransactivating noncoding RNA (trancRNA), or a nucleic acid encodingsaid CasZ trancRNA.

Aspect 23. A kit comprising an engineered and/or non-naturally occurringcomplex comprising: (a) a CasZ polypeptide, or a nucleic acid encodingsaid CasZ polypeptide; (b) a CasZ guide RNA, or a nucleic acid encodingsaid CasZ guide RNA, wherein said CasZ guide RNA comprises a guidesequence that is complementary to a target sequence of a target nucleicacid, and comprises a region that can bind to the CasZ polypeptide.

Aspect 24. The kit of aspect 23, further comprising a CasZtransactivating noncoding RNA (trancRNA), or a nucleic acid encodingsaid CasZ trancRNA.

Aspect 25. A genetically modified eukaryotic cell, comprising at leastone of: (a) a CasZ polypeptide, or a nucleic acid encoding said CasZpolypeptide; (b) a CasZ guide RNA, or a nucleic acid encoding said CasZguide RNA, wherein said CasZ guide RNA comprises a guide sequence thatis complementary to a target sequence of a target nucleic acid, andcomprises a region that can bind to the CasZ polypeptide; and (c) a CasZtransactivating noncoding RNA (trancRNA), or a nucleic acid encodingsaid CasZ trancRNA.

Aspect 26. The composition, kit, or eukaryotic cell of any one of thepreceding aspects, characterized by at least one of: (a) the nucleicacid encoding said CasZ polypeptide comprises a nucleotide sequencethat: (i) encodes the CasZ polypeptide and, (ii) is operably linked to aheterologous promoter; (b) the nucleic acid encoding said CasZ guide RNAcomprises a nucleotide sequence that: (i) encodes the CasZ guide RNAand, (ii) is operably linked to a heterologous promoter; and (c) thenucleic acid encoding said CasZ trancRNA comprises a nucleotide sequencethat: (i) encodes the CasZ trancRNA and, (ii) is operably linked to aheterologous promoter.

Aspect 27. The composition, kit, or eukaryotic cell of any one of thepreceding aspects, for use in a method of therapeutic treatment of apatient.

Aspect 28. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein at least one of: the nucleic acidencoding said CasZ polypeptide, the nucleic acid encoding said CasZguide RNA, and the nucleic acid encoding said CasZ trancRNA, is arecombinant expression vector.

Aspect 29. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein the CasZ guide RNA and/or the CasZtrancRNA comprises one or more of: a modified nucleobase, a modifiedbackbone or non-natural internucleoside linkage, a modified sugarmoiety, a Locked Nucleic Acid, a Peptide Nucleic Acid, and adeoxyribonucleotide.

Aspect 30. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein the CasZ polypeptide is a variant CasZpolypeptide with reduced nuclease activity compared to a correspondingwild type CasZ protein.

Aspect 31. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein at least one of: the CasZ polypeptide,the nucleic acid encoding the CasZ polypeptide, the CasZ guide RNA, thenucleic acid encoding the CasZ guide RNA, the CasZ trancRNA, and thenucleic acid encoding the CasZ trancRNA; is conjugated to a heterologousmoiety.

Aspect 32. The method, composition, kit, or eukaryotic cell of aspect31, wherein the heterologous moiety is a heterologous polypeptide.

Aspect 33. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein the CasZ polypeptide has reducednuclease activity compared to a corresponding wild type CasZ protein,and is fused to a heterologous polypeptide.

Aspect 34. The method, composition, kit, or eukaryotic cell of aspect33, wherein the heterologous polypeptide: (i) has DNA modifyingactivity, (ii) exhibits the ability to increase or decreasetranscription, and/or (iii) has enzymatic activity that modifies apolypeptide associated with DNA.

Aspect 35. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein the CasZ polypeptide comprises anamino acid sequence having 70% or more (at least 70%, at least 75%, atleast 80%, at least 85%, at least 90%, at least 95%, at least 98%, atleast 99%, or 100%) amino acid sequence identity with a CasZ protein ofFIG. 1 or FIG. 7.

Aspect 36. The method, composition, kit, or eukaryotic cell of any oneof the preceding aspects, wherein the guide sequence and the region thatbinds to the CasZ polypeptide are heterologous to one another.

Aspect 37. A method of detecting a target DNA in a sample, the methodcomprising: (a) contacting the sample with: (i) a CasZ polypeptide; (ii)a guide RNA comprising: a region that binds to the CasZ polypeptide, anda guide sequence that hybridizes with the target DNA; and (iii) adetector DNA that is single stranded and does not hybridize with theguide sequence of the guide RNA; and (b) measuring a detectable signalproduced by cleavage of the single stranded detector DNA by the CasZ,thereby detecting the target DNA.

Aspect 38. The method of aspect 37, wherein the target DNA is singlestranded.

Aspect 39. The method of aspect 37 or 38, wherein the target DNA isdouble stranded.

Aspect 40. The method of any one of aspects 37-39, wherein the targetDNA is viral DNA.

Aspect 41. The method of any one of aspects 37-40, wherein the targetDNA is papovavirus, hepdnavirus, herpesvirus, adenovirus, poxvirus, orparvovirus DNA.

Aspect 42. The method of any one of aspects 37-41, wherein the CasZpolypeptide comprises an amino acid sequence having at least 85% aminoacid sequence identity to the CasZ amino acid sequence set forth in anyone of FIGS. 1 and 7.

Aspect 43. The method of any one of aspects 37-41, wherein the CasZpolypeptide is a Cas14a polypeptide.

Aspect 44. The method according to any one of aspects 37-43, wherein thesample comprises DNA molecules from a cell lysate.

Aspect 45. The method according to any one of aspects 37-44, wherein thesample comprises cells.

Aspect 46. The method according to any one of aspects 37-45, whereinsaid contacting is carried out inside of a cell in vitro, ex vivo, or invivo.

Aspect 47. The method according to aspect 46, wherein the cell is aeukaryotic cell.

Aspect 48. The method according to any one of aspects 37-47, wherein thetarget DNA can be detected at a concentration as low as 10 aM.

Aspect 49. The method according to any one of aspects 37-48, comprisingdetermining an amount of the target DNA present in the sample.

Aspect 50. The method according to aspect 49, wherein said determiningcomprises: measuring the detectable signal to generate a testmeasurement; measuring a detectable signal produced by a referencesample or cell to generate a reference measurement; and comparing thetest measurement to the reference measurement to determine an amount oftarget DNA present in the sample.

Aspect 51. The method according to any one of aspects 37-50, whereinmeasuring a detectable signal comprises one or more of: goldnanoparticle based detection, fluorescence polarization, colloid phasetransition/dispersion, electrochemical detection, andsemiconductor-based sensing.

Aspect 52. The method according to any one of aspects 37-51, wherein thesingle stranded detector DNA comprises a fluorescence-emitting dye pair.

Aspect 53. The method according to aspect 52, wherein thefluorescence-emitting dye pair produces an amount of detectable signalprior to cleavage of the single stranded detector DNA, and the amount ofdetectable signal is reduced after cleavage of the single strandeddetector DNA.

Aspect 54. The method according to aspect 52, wherein the singlestranded detector DNA produces a first detectable signal prior to beingcleaved and a second detectable signal after cleavage of the singlestranded detector DNA.

Aspect 55. The method according to any one of aspects 52-54, wherein thefluorescence-emitting dye pair is a fluorescence resonance energytransfer (FRET) pair.

Aspect 56. The method according to aspect 18, wherein an amount ofdetectable signal increases after cleavage of the single strandeddetector DNA.

Aspect 57. The method according to aspect 52 or aspect 56, wherein thefluorescence-emitting dye pair is a quencher/fluor pair.

Aspect 58. The method according to any one of aspects 52-57, wherein thesingle stranded detector DNA comprises two or more fluorescence-emittingdye pairs.

Aspect 59. The method according to aspect 58, wherein said two or morefluorescence-emitting dye pairs include a fluorescence resonance energytransfer (FRET) pair and a quencher/fluor pair.

Aspect 60. The method according to any one of aspects 37-59, wherein thesingle stranded detector DNA comprises a modified nucleobase, a modifiedsugar moiety, and/or a modified nucleic acid linkage.

Aspect 61. The method according to any one of aspects 37-60, wherein themethod comprises amplifying nucleic acids in the sample.

Aspect 62. The method according to aspect 61, wherein said amplifyingcomprises isothermal amplification.

Aspect 63. The method according to aspect 62, wherein the isothermalamplification comprises recombinase polymerase amplification (RPA).

Aspect 64. The method according to any one of aspects 61-63, whereinsaid amplifying begins prior to the contacting of step (a).

Aspect 65. The method according to any one of aspects 61-63, whereinsaid amplifying begins together with the contacting of step (a).

Aspect 66. A kit for detecting a target DNA in a sample, the kitcomprising: (a) a guide RNA, or a nucleic acid encoding the guide RNA;wherein the guide RNA comprises: a region that binds to a CasZpolypeptide, and a guide sequence that is complementary to a target DNA;and (b) a labeled detector DNA that is single stranded and does nothybridize with the guide sequence of the guide RNA.

Aspect 67. The kit of aspect 66, further comprising a CasZ polypeptide.

Aspect 68. The kit of aspect 67, wherein the CasZ polypeptide comprisesan amino acid sequence having at least 85% amino acid sequence identityto the CasZ amino acid sequence set forth in any one of FIGS. 1 and 7.

Aspect 69. The kit of aspect 67, wherein the CasZ polypeptide is aCas14a polypeptide.

Aspect 70. The kit of any one of aspects 66-69, wherein the singlestranded detector DNA comprises a fluorescence-emitting dye pair.

Aspect 71. The kit of aspect 70, wherein the fluorescence-emitting dyepair is a FRET pair.

Aspect 72. The kit of aspect 70, wherein the fluorescence-emitting dyepair is a quencher/fluor pair.

Aspect 73. The kit of any one of aspects 70-72, wherein the singlestranded detector DNA comprises two or more fluorescence-emitting dyepairs.

Aspect 74. The kit of aspect 73, wherein said two or morefluorescence-emitting dye pairs include a first fluorescence-emittingdye pair that produces a first detectable signal and a secondfluorescence-emitting dye pair that produces a second detectable signal.

Aspect 75. The kit of any one of aspects 66-74, further comprisingnucleic acid amplification components.

Aspect 76. The kit of aspect 75, wherein the nucleic acid amplificationcomponents are components for recombinase polymerase amplification(RPA).

Aspect 77. A method of cleaving single stranded DNAs (ssDNAs), themethod comprising: contacting a population of nucleic acids, whereinsaid population comprises a target DNA and a plurality of non-targetssDNAs, with: (i) a CasZ polypeptide; and (ii) a guide RNA comprising: aregion that binds to the CasZ polypeptide, and a guide sequence thathybridizes with the target DNA, wherein the CasZ polypeptide cleavesnon-target ssDNAs of said plurality.

Aspect 78. The method of aspect 77, wherein said contacting is inside ofa cell in vitro, ex vivo, or in vivo.

Aspect 79. The method of aspect 78, wherein the cell is a eukaryoticcell.

Aspect 80. The method of aspect 79, wherein the eukaryotic cell is aplant cell.

Aspect 81. The method of any one of aspects 78-80, wherein thenon-target ssDNAs are foreign to the cell.

Aspect 82. The method of aspect 81, wherein the non-target ssDNAs areviral DNAs.

Aspect 83. The method of any one of aspects 77-82, wherein the targetDNA is single stranded.

Aspect 84. The method of any one of aspects 77-82, wherein the targetDNA is double stranded.

Aspect 85. The method of any one of aspects 77-84, wherein the targetDNA is viral DNA.

Aspect 86. The method of any one of aspects 77-84, wherein the targetDNA is papovavirus, hepdnavirus, herpesvirus, adenovirus, poxvirus, orparvovirus DNA.

EXAMPLES

The following examples are put forth so as to provide those of ordinaryskill in the art with a complete disclosure and description of how tomake and use the present invention, and are not intended to limit thescope of what the inventors regard as their invention nor are theyintended to represent that the experiments below are all or the onlyexperiments performed. Efforts have been made to ensure accuracy withrespect to numbers used (e.g. amounts, temperature, etc.) but someexperimental errors and deviations should be accounted for. Unlessindicated otherwise, parts are parts by weight, molecular weight isweight average molecular weight, temperature is in degrees Celsius, andpressure is at or near atmospheric. Standard abbreviations may be used,e.g., bp, base pair(s); kb, kilobase(s); pl, picoliter(s); s or sec,second(s); min, minute(s); h or hr, hour(s); aa, amino acid(s); kb,kilobase(s); bp, base pair(s); nt, nucleotide(s); i.m.,intramuscular(ly); i.p., intraperitoneal(ly); s.c., subcutaneous(ly);and the like.

Materials and Methods

The following materials and methods generally apply to the resultspresented in the Examples described herein except where noted otherwise.

Metagenomics and Metatranscriptomics

The initial analysis was performed on previously assembled and binnedmetagenomes from two sites: the Rifle Integrated Field Research (IFRC)site, adjacent to the Colorado River near Rifle, Colo. and CrystalGeyser, a cold, CO₂-driven geyser on the Colorado Plateau in Utah.Metatranscriptomic data from IFRC site was used to detect transcriptionof non-coding elements in nature. Further mining of CRISPR-Cas14 systemswas then performed on public metagenomes from IMG/M.

CRISPR-Cas Computation Analysis

The assembled contigs from the various samples were scanned with theHMMer suite for known Cas proteins using Hidden Markov Model (HMMs)profiles. Additional HMMs were constructed for Cas14 proteins based onthe MAFFT alignments of putative type V effectors that contained lessthan 800 aa, and were adjacent to acquisition cas genes and CRISPRarrays. These HMMs were iteratively refined by augmenting them withmanually selected novel putative Cas14 sequences that were found usingthe existing Cas14 HMM models. The sequence of Cas14 repeat sequencesare provided in Table 3. CRISPR arrays were identified using a localversion of the CrisprFinder software and CRISPRDetect. Phylogenetictrees of Cas1 and type V effector proteins were constructed using RAxMLwith PROTGAMMALG as the substitution model and 100 bootstrap samplings.Trees were visualized using FigTree 1.4.1(http://tree.bio.ed.ac.uk/software/figtree/). Metatranscriptomic readswere mapped to assembled contigs using Bowtie2. RNase presence analysiswas based on HMMs that were built from alignment of KEGG orthologousgroups (KOs) downloaded from KEGG database.

TABLE 3 Repeat sequences (non-guide sequenceportion of a Cas14 guide RNA) of all Cas14proteins used herein (e.g., see FIG. 7) are shown in Table 3. SEQScaffold Cas14 ID Accession Protein Repeat sequence NO: No: Cas14a.1GTTGCAGAACCCGAATAGACGAATGAAGGAATGCAAC  53 NCBI: (CasZa.3) MK005734Cas14a.2 CTTGCAGAACCCGGATAGACGAATGAAGGAATGCAAC 295 NCBI: (Za.8) MK005733Cas14a.3 GTTGCAGAACCCGAATAGACGAATGAAGGAATGCAAC  53 NCBI: (CasZa.3)MK005732 Cas14a.4 CTATCATATTCAGAACAAAGGGATTAAGGAATGCAAC  54 NCBI:(CasZa.4) MK005735 Cas14a.5 CTTTCATACTCAGAACAAAGGGATTAAGGAATGCAAC  55NCBI: (CasZa.5, MK005736 CasZb.3) Cas14a.6GTCTACAACTCATTGATAGAAATCAATGAGTTAGACA  56 IMG/M: (CasZa.6) Ga0137385_10000156 Cas14b.1 GTTGCAGAAATAGAATAAAGGAATTAAGGAATGCAAC  59 NCBI:(CasZb.2) MK005737 Cas14b.2 CTTTCATACTCAGAACAAAGGGATTAAGGAATGCAAC  55NCBI: (CasZa.5, MK005738 CasZb.3) Cas14b.3ATTTCATACTCAGAACAAAGGGATTAAGGAATGCAAC  61 NCBI: (CasZb.4) MK005739Cas14b.4 GTTTCAGCGCACGAATTAACGAGATGAGAGATGCAAC 303 NCBI: (CasZb.16)MK005740 Cas14b.5 CTTGCAGAAGCTGAATAGACGAATCAAGGAATGCAAC  63 NCBI:(CasZb.6) MK005741 Cas14b.6 CTTGCAGGCCTTGAATAGAGGAGTTAAGGAATGCAAC 296NCBI: (Za.12) MK005742 Cas14b.7 GTTGCAGCGCCCGAACTGACGAGACGAGAGATGCAAC 66 IMG/M: (CasZb.9) Ga0172369_ 10000737 Cas14b.8GTTGCGCGAATAGAATAAAGGAATTAAGGAATGCAAC  67 IMG/M: (CasZb.10) Ga0172369_10010464 Cas14b.9 AGTTGCATTCCTTAATCCCTCTGTTCAGTTTGTGCAAT  68 IMG/M:(CasZb.11) Ga0172365_ 10004421 Cas14b.10GTTGCACAGTGCTAATTAGAGAAACTAGGAATGCAAC 297 NCBI: (Zb.13) MK005743Cas14b.11 GTTGCGGCGCGCGAATAAACGAGACTAGGAATGCAAC  70 NCBI: (CasZc.2)MK005744 Cas14b.12 CTAGCATATTCAGAACAAAGGGATTAAGGAATGCAAC 298 NCBI:(Zb.14) MK005745 Cas14b.13 CTTTCATATTCAGAACAAAGGGATTAAGGAATGCAAC  72NCBI: (CasZc.4) MK005746 Cas14b.14 CTTTCATATTCAGAAACTAGGGGTTAAGGACTGCAAC299 NCBI: (Zb.15) MK005747 Cas14b.15GTTGCAGCCCCCGAACTAACGAGATGAGAGATGCAAC  74 IMG/M: (CasZc.6) Ga0116204_1008574 Cas14b.16 CTTGCAGAACAATCATATATGACTAATCAGACTGCAAC  75 IMG/M:(casZc.7) Ga0078972_ 1001015a Cas14c.1GTTGCATCCCTACGTCGTGAGCACCGGTGAGTGCAAC 300 NCBI: (Zb.8) MK005748 Cas14c.2GTCCCTACTCGCTAGGGAAACTAATTGAATGGAAAC  77 IMG/M: (CasZd.2) JGI12048J13642_ 10201286 Cas14d.1 CTTCCAAACTCGAGCCAGTGGGGAGAGAAGTGGCA  79 NCBI:(CasZe.2) MK005750 Cas14d.2 CCTGTAGACCGGTCTCATTCTGAGAGGGGTATGCAACT  80NCBI: (CasZe.3) MK005751 Cas14d.3 GTCTCGAGACCCTACAGATTTTGGAGAGGGGTGGGAC 81 NCBI: (CasZe.4) MK005752 Cas14e.1GTAGCAGGACTCTCCTCGAGAGAAACAGGGGTATGCT  83 NCBI: (CasZf.1) MK005753Cas14e.2 GTACAATACCTCTCCTTTAAGAGAGGGAGGGGTACGCTAC  84 NCBI: (CasZf.2)MK005754 Cas14e.3 GGAAAGGAATCCCCTGAAGGAAACGAGGGGG 301 NCBI: (Zc.5)MK005755 Cas14f.1 GGTTCCCCCGGGCGCGGGTGGGGTGGCG  86 NCBI: (CasZg.1)MK005756 Cas14f.2 GGCTGCTCCGGGTGCGCGTGGAGCGAGG  87 IMG/M: (CasZg.2)Ga0105042_ 100140 Cas14g.1 GTGTCCATCAATCAGATTTGCGTTGGCCGGTGCAAT 302NCBI: (Ze.3) MK005758 Cas14g.2 GCCGCAGCGGCCGACGCGGCCCTGATCGATGGACAC  90IMG/M: (CasZi.2) Ga0123330_ 1010394 Cas14h.1GGCTAGCCCGTGCGCGCAGGGACGAGTGG  92 IMG/M: (CasZk.1) Ga0070762_ 10001740Cas14h.2 GCCCGTGCGCGCAGGGACGAGTGG  93 IMG/M: (CasZk.2) Ga0070766_10011912 Cas14h.3 CCATCGCCCCGCGCGCACGTGGATGAGCC  95 IMG/M: (CasZk.4)Ga0116216_ 10000905 Cas14u.1 GTTATAAAGGCGGGGATCGCGACCGAGCGATTGAAAG  57IMG/M: (CasZa.7) Ga0066793_ 10010091 Cas14u.2GTCTCCATGACTGAAAAGTCGTGGCCGAATTGAAAC  65 IMG/M: (CasZb.8) JGI24730J26740_ 1002785 Cas14u.3 GTTGCATTCGGGTGCAAAACAGGGAGTAGAGTGTAAC  78 NCBI:(CasZe.1) MK005749 Cas14u.4 CTTTTAGACAGTTTAAATTCTAAAGGGTATAAAAC 307NCBI: (CasZj.2) MK005757 Cas14u.5 GTCGAAATGCCCGCGCGGGGGCGTCGTACCCGCGAC308 IMG/M: (CasZj.1) Ga0137373_ 10000316 Cas14u.6GTTGCAGCGGCCGACGGAGCGCGAGCGTGGATGCCAC 309 IMG/M: (CasZk.3) Ga0070717_10000077 Cas14u.7 CTTTAGACTTCTCCGGAAGTCGAATTAATGGAAAC 310 IMG/M:(CasZl.1) JGI12210 IMG/M: J13797_ 10004690 Cas14u.8GGGCGCCCCGCGCGAGCGGGGGTTGAAG 311 IMG/M: (CasZl.2) Ga0073904_ 10021651

Generation of Expression Plasmids, RNA and DNA Substrates

Minimal CRISPR loci for putative systems were designed by removingacquisition proteins and generating minimal arrays with a single spacer.These minimal loci were ordered as gBlocks (IDT) and assembled into aplasmid with a tetracycline inducible promoter driving expression of thelocus. Plasmid maps were available on Addgene and in the figures. AllRNA was in vitro transcribed using T7 polymerase and PCR products asdsDNA template. Resulting IVTs were gel extracted and ethanolprecipitated. DNA substrates were obtained from IDT and their sequencesare available in Table 4. For radiolabeled cleavage assays DNA oligoswere gel extracted from a PAGE gel before radiolabeling. For FQ assays,DNA substrates were used without further purification.

E. coli RNAseq

Small RNA sequencing was conducted as described previously withmodification in Harrington et al. (2017). E. coli NEB Stable3 wastransformed with a plasmid expressing Cas14a1 system with a tetracyclineinducible promoter upstream of the Cas14a1 ORF or the same plasmid withan N-terminal 10×-histidine tag fused to Cas14. Starters were grown upovernight in SOB, diluted 1:100 in 5 mL fresh SOB containing 214 nManhydrotetracycline and grown up overnight at 25° C. For sequencing ofRNA pulled down with Cas14a, the plasmid containing an N-terminalHis-tag fused to Cas14a1 was grown up at 18° C. before lysis andpurification as described in “Protein purification”, stopping after theNi-NTA elution. Cells were pelleted and RNA was extracted using hotphenol as previously described. Total nucleic acids were treated withTURBO DNase and phenol extracted. The resulting RNA was treated withrSAP which was heat inactivated before addition of T4 PNK. Adapters wereligated onto the small RNA using the NEBnext small RNA kit andgel-extracted on an 8% native PAGE gel. RNA was sequenced on a MiSeqwith single end 300 bp reads. For analysis, the resulting reads weretrimmed using Cutadapt, discarding sequences<8 nt and mapped to theplasmid reference using Bowtie2.

PAM Depletion Assays

PAM depletion assays were conducted as previously described in Bursteinet al. (2017). Randomized plasmid libraries were generated using aprimer containing a randomized PAM region adjacent to the targetsequence. The randomized primers were hybridized with a primer that wascomplementary to the 3′ end of the primer and the duplex was extendedusing Klenow Fragment (NEB). The dsDNA containing the target and weredigested with EcoRI and NcoI, ligated into pUC 19 backbone andtransformed into E. coli DH5a and >107 cells were harvested. Next E.coli NEBstable was transformed with either a CRISPR plasmid or an emptyvector control and these transformed E. coli were made electrocompetentby repeated washing with 10% glycerol. These electrocompetent cells weretransformed with 200 ng of the target library and plated on bioassaydishes containing selection for the target (carbenicillin, 100 mg 1-1)and CRISPR plasmid (chloramphenicol, 30 mg 1-1). Cells were harvestedand prepared for amplicon sequencing on an Illumina MiSeq. The PAMregion was extracted using Cutadapt and depletion values were calculatedin python. PAMs were visualized using WebLogo.

Transcriptomic RNA Mapping

RNA was extracted from 0.2 mm filters using the Invitrogen TRIzolreagent, followed by genomic DNA removal and cleaning using the QiagenRNase-Free DNase Set kit and the Qiagen Mini RNeasy kit. An Agilent 2100Bioanalyzer (Agilent Technologies) was used to assess the integrity ofthe RNA samples. The Applied Biosystems SOLiD Total RNA-Seq kit was usedto generate the cDNA template library. The SOLiD EZ Bead system (LifeTechnologies) was used to perform emulsion clonal bead amplification togenerate bead templates for SOLiD platform sequencing. Samples weresequenced at Pacific Northwest National Laboratory on the 5500XL SOLiDplatform. The 50 bp single reads were trimmed using Sickle as in Brownet al. (2015).

Protein Purification

Cas14a1 was purified as described previously with modification. E. coliBL21(DE3) RIL were transformed with 10×His-MBP-Cas14a1 expressionplasmid and grown up to OD600=0.5 in Terrific Broth (TB) and inducedwith 0.5 mM IPTG. Cells were grown overnight at 18° C., collected bycentrifugation, resuspended in Lysis Buffer (50 mM Tris-HCl, pH 7.5, 20mM imidazole, 0.5 mM TCEP, 500 mM NaCl) and broken by sonication. Lysatewas batch loaded on to Ni-NTA resin, washed with the above buffer beforeelution with Elution Buffer (50 mM Tris-HCl, pH 7.5, 300 mM imidazole,0.5 mM TCEP, 500 mM NaCl). The MBP and His-tag were removed by overnightincubation with TEV at 4° C. The resulting protein exchanged into BufferA (20 mM HEPES, pH 7.5, 0.5 mM TCEP, 150 mM NaCl) and loaded over tandemMBP, heparin columns (GE, Hi-Trap) and eluted with a linear gradientfrom Buffer A to Buffer B (20 mM HEPES, pH 7.5, 0.5 mM TCEP, 1250 mMNaCl). The resulting fractions containing Cas14a1 were loaded onto anS200 gel filtration column, flash frozen and stored at −80° C. untiluse.

In Vitro Cleavage Assays

Radiolabeled

Radiolabeled cleavage assays were conducted in 1× Cleavage Buffer (25 mMNaCl, 20 mM HEPES, pH 7.5, 1 mM DTI, 5% glycerol). 100 nM Cas14a1 wascomplexed with 125 nM crRNA and 125 nM tracrRNA for 10 min at RT.˜1 nMradiolabeled DNA or RNA substrate was added and allowed to react for 30min at 37° C. The reaction was stopped by adding 2× Quench Buffer (90%formamide, 25 mM EDTA and trace bromophenol blue), heated to 95° C. for2 min and run on a 10% polyacrylamide gel containing 7M Urea and0.5×TBE. Products were visualized by phosphorimaging.

M13 DNA Cleavage

M13 DNA cleavage assays were conducted in 100 mM NaCl, 20 mM HEPES, pH7.5, 1 mM DTT, 5% glycerol. 250 nM Cas14a1 was complexed with 250 nMcrRNA and 250 nM tracrRNA and 250 nM ssDNA activator. The reaction wasinitiated by addition of 5 nM M13 ssDNA plasmid and was quenched byaddition of loading buffer supplemented with 10 mM EDTA. Products wereseparated on a 1.5% agarose TAE gel prestained with SYBR gold(Thermofisher).

FQ Detection of Trans-Cleavage

FQ detection was conducted as previously described in Chen et al. (2018)with modification. 100 nM Cas14a1 was complexed with 125 nM crRNA, 125nM tracrRNA, 50 nM FQ probe and 2 nM ssDNA activator in 1× CleavageBuffer at 37° C. for 10 min. The reaction was then initiated by additionof activator DNA when for all reactions except for the RNA optimizationexperiments where the variable RNA component was used to initiate. Thereaction was monitored in a fluorescence plate reader for up to 120minutes at 37° C. with fluorescence measurements taken every 1 min (λex:485 nm; λem: 535 nm). The resulting data were background subtractedusing the readings taken in the absence of activator and fit using asingle exponential decay curve.

Data Availability

Plasmids used herein are available on Addgene (plasmid numbers 112500,112501, 112502, 112503, 112504, 112505, 112506). Oligonucleotides usedherein are provided in Table 4 and Table 5. The plasmids used herein areprovided in FIG. 24. The Cas14 protein sequences used herein areprovided in FIG. 7.

TABLE 4 Oligonucleotides and plasmids used herein. SEQ DNA Name SequenceID NO: Radiolabeld cTACGCCGattatcttctg 220 DNA acaactttcgcaagcggtgactivator 1 T taaggtaAAAAAtgCGGGC strand AC RadiolabeldGTGCCCGcaTTTTTtacct 221 DNA tacaccgcttgcgaaagtt activator 1gtcagaagataatCGGCGT NT strand Ag Radiolabeld tttatatgtttctcctgga 222 DNAgataacgcaatcgtgacaa activator 2 T ctttcgcaagcggtgtaag strandgtaGCAGGCTTCcgaattc cgcgtttttacggc Radiolabeld gccgtaaaaacgcggaatt 223DNA cgGAAGCCTGCtaccttac activator 2 accgcttgcgaaagttgtc NT strandacgattgcgttatctccag gagaaacatataaa Radiolabeld gatcttcagcTATACATTA 224Activator 3 TTGCACCAACACTAAGGCA GAGTATGtttacctggac RadiolabeldgatcttcagcTTTGTATTA 225 Activator 4 CTGGAAGGATGCTTGCTTGAGGTGTAaaaacctggac F-Q 5 nt /56-FAM/TTTTT/ 226 3IABkFQ/ F-Q 6 nt/56-FAM/TTTTTT/ 227 3IABkFQ/ F-Q 7 nt /56-FAM/TTTTTTT/ 228 3IABkFQ/F-Q 8 nt /56-FAM/TTTTTTTT/ 229 3IABkFQ/ F-Q 9 nt /56-FAM/TTTTTTTTT/ 2303IABkFQ/ F-Q 10 nt /56-FAM/TTTTTTTTTT/ 231 3IABkFQ/ F-Q 11 nt/56-FAM/TTTTTTTTTT 232 T/3IABkFQ/ F-Q 12 nt /56-FAM/TTTTTTTTTTT 233T/3IABkFQ/ F-Q 10 nt /56-FAM/TATATATATA/ 371 A/T 3IABkFQ/ F-Q 6 nt A/T/56-FAM/TATATA/ 372 3IABkFQ/ F-Q 5 nt A/T /56-FAM/TATAT/ 373 3IABkFQ/Target 2, GCCGGGGTGGTGCCCATCC 234 Perfect TGGTCGAGCTGGACGGCGA AAAT 3′CGTAAACGGCCACAAGC Target 2, GCCGGGGTGGTGCCCATCC 235 PerfectTGGTCGAGCTGGACGGCGA TCGT 3′ CGTGCTCGGCCACAAGC Target 2, GCCGGGGTGGTGCCCATCC 236 1-2 MM TGGTCGAGCTGGACGGCGA GCTAAACGGCCACAAGCTarget 2,  GCCGGGGTGGTGCCCATCC 237 3-4 MM TGGTCGAGCTGGACGGCCTCGTAAACGGCCACAAGC Target 2,  GCCGGGGTGGTGCCCATCC 238 5-6 MMTGGTCGAGCTGGACGCGGA CGTAAACGGCCACAAGC Target 2,  GCCGGGGTGGTGCCCATCC 2397-8 MM TGGTCGAGCTGGAGCGCGA CGTAAACGGCCACAAGC Target 2, GCCGGGGTGGTGCCCATCC 240 9-10 MM TGGTCGAGCTGCTCGGCGA CGTAAACGGCCACAAGCTarget 2,  GCCGGGGTGGTGCCCATCC 241 11-12 MM TGGTCGAGCACGACGGCGACGTAAACGGCCACAAGC Target 2,  GCCGGGGTGGTGCCCATCC 242 13-14 MMTGGTCGACGTGGACGGCGA CGTAAACGGCCACAAGC Target 2,  GCCGGGGTGGTGCCCATCC 24315-16 MM TGGTCCTGCTGGACGGCGA CGTAAACGGCCACAAGC Target 2,GCCGGGGTGGTGCCCATCC 244 17-18 MM TGGAGGAGCTGGACGGCGA CGTAAACGGCCACAAGCTarget 2,  GCCGGGGTGGTGCCCATCC 245 19-20 MM TCCTCGAGCTGGACGGCGACGTAAACGGCCACAAGC HERC2 Amp G*T*G*T*TAATACAAAGG 246 FwdTACAGGAACAAAGAATTTG HERC2 Amp CAAAGAGAAGCCTCGGCC 247 Rev Target 1,TTTATTCAAGGCAATCACT 248 Perfect ATCAGCTGTGGAACACCCA AAAT 3′GGTAAACTAACACAACT Target 1, TTTATTCAAGGCAATCACT 249 PerfectATCAGCTGTGGAACACCCA TCGT 3′ GGTGCTCTAACACAACT Target 1, TTTATTCAAGGCAATCACT 250 1-2 MM ATCAGCTGTGGAACACCCA CCTAAACTAACACAACTTarget 1,  TTTATTCAAGGCAATCACT 251 3-4 MM ATCAGCTGTGGAACACCGTGGTAAACTAACACAACT Target 1,  TTTATTCAAGGCAATCACT 252 5-6 MMATCAGCTGTGGAACAGGCA GGTAAACTAACACAACT Target 1,  TTTATTCAAGGCAATCACT 2537-8 MM ATCAGCTGTGGAAGTCCCA GGTAAACTAACACAACT Target 1, TTTATTCAAGGCAATCACT 254 9-10 MM ATCAGCTGTGGTTCACCCA GGTAAACTAACACAACTTarget 1,  TTTATTCAAGGCAATCACT 255 11-12 MM ATCAGCTGTCCAACACCCAGGTAAACTAACACAACT Target 1,  TTTATTCAAGGCAATCACT 256 13-14 MMATCAGCTCAGGAACACCCA GGTAAACTAACACAACT Target 1,  TTTATTCAAGGCAATCACT 25715-16 MM ATCAGGAGTGGAACACCCA GGTAAACTAACACAACT Target 1, TTTATTCAAGGCAATCACT 258 17-18 MM ATCTCCTGTGGAACACCCA GGTAAACTAACACAACTTarget 1,  TTTATTCAAGGCAATCACT 259 19-20 MM AAGAGCTGTGGAACACCCAGGTAAACTAACACAACT Target 3, cTACGCCGattatcttctg 220 Perfectacaactttcgcaagcggtg taaggtaAAAAAtgCGGGC AC Target 3, cTACGCCGattatcttctg 260 1-2 MM acaactttcgcaagcggtg taaggCGAAAAAtgCGGGCAC Target 3,  cTACGCCGattatcttctg 261 3-4 MM acaactttcgcaagcggtgtaaAAtaAAAAAtgCGGGC AC Target 3,  cTACGCCGattatcttctg 262 5-6 MMacaactttcgcaagcggtg tGGggtaAAAAAtgCGGGC AC Target 3, cTACGCCGattatcttctg 263 7-8 MM acaactttcgcaagcggtA CaaggtaAAAAAtgCGGGCAC Target 3,  cTACGCCGattatcttctg 264 9-10 MM acaactttcgcaagcgACgtaaggtaAAAAAtgCGGGC AC Target 3,  cTACGCCGattatcttctg 265 11-12 MMacaactttcgcaagTAgtg taaggtaAAAAAtgCGGGC AC Target 3, cTACGCCGattatcttctg 266 13-14 MM acaactttcgcaGAcggtg taaggtaAAAAAtgCGGGCAC Target 3,  cTACGCCGattatcttctg 267 15-16 MM acaactttcgTGagcggtgtaaggtaAAAAAtgCGGGC AC Target 3,  cTACGCCGattatcttctg 268 17-18 MMacaactttTAcaagcggtg taaggtaAAAAAtgCGGGC AC Target 3, cTACGCCGattatcttctg 269 19-20 MM acaactCCcgcaagcggtg taaggtaAAAAAtgCGGGCAC Target 3,  cTACGCCGattatcttctg 270 21-22 MM acaaTCttcgcaagcggtgtaaggtaAAAAAtgCGGGC AC Target 3,  cTACGCCGattatcttctg 271 23-24 MMacGGctttcgcaagcggtg taaggtaAAAAAtgCGGGC AC Target 3, cTACGCCGattatcttctg 272 25-26 MM GTaactttcgcaagcggtg taaggtaAAAAAtgCGGGCAC Full Length cTACGCCGattatcttctg 220 Activator acaactttcgcaagcggtgtaaggtaAAAAAtgCGGGC AC -20 5′ tatcttctgacaactttcg 273 activatorcaagcggtgtaaggtaAAA target AAtgCGGGCAC -25 5′ tctgacaactttcgcaagc 274activator ggtgtaaggtaAAAAAtgC target GGGCAC -30 5′ caactttcgcaagcggtgt275 activator aaggtaAAAAAtgCGGGCA target C -35 5′ ttcgcaagcggtgtaaggt276 activator aAAAAAtgCGGGCAC target -5 3′ cTACGCCGattatcttctg 277activator acaactttcgcaagcggtg target taaggtaAAAAAtgCG -9 3′cTACGCCGattatcttctg 278 activator acaactttcgcaagcggtg target taaggtaAAAA-14 3′ cTACGCCGattatcttctg 279 activator acaactttcgcaagcggtg target ta-19 3′ cTACGCCGattatcttctg 280 activator acaactttcgcaagcg target -24 3′cTACGCCGattatcttctg 281 activator acaactttcgc target -29 3′cTACGCCGattatcttctg 282 activator acaact target -34 3′cTACGCCGattatcttctg 283 activator a target No loop caactttcgcaagcggtgt284 aaggtaAAAAAtgCG 0 nt SL atggaatgtggcgaacgct 285 ttcaacGAAAcaactttcgcaagcggtgtaaggtaAAA AAtgCG 5 nt SL atggaatgtggcgaacgct 286tagttgGAAAcaactttcg caagcggtgtaaggtaAAA AAtgCG 10 nt SLatggaatgtggcgaagcga 287 aagttgGAAAcaactttcg caagcggtgtaaggtaAAA AAtgCG15 nt SL atggaatgtgcgcttgcga 288 aagttgGAAAcaactttcg caagcggtgtaaggtaAAAAAtgCG 20 nt SL atggatacaccgcttgcga 289 aagttgGAAAcaactttcgcaagcggtgtaaggtaAAA AAtgCG 25 nt SL taccttacaccgcttgcga 290aagttgGAAAcaactttcg caagcggtgtaaggtaAAA AAtgCG M13_1 OligoGTTTTATCTTCTGCTGGTG 291 GTTCGTTCGGTATTTTTAA TG M13_2 OligoCATTAAAAATACCGAACGA 292 ACCACCAGCAGAAGATAAA AC M13_3 OligoGACCATTTGCGAAATGTAT 293 CTAATGGTCAAACTAAATC TACTC M13_4 OligoGAGTAGATTTAGTTTGACC 294 ATTAGATACATTTCGCAAA TGGTC

TABLE 5 RNA Oligonucleotides. SEQ RNA Name Sequence ID NO: ssRNAcTACGCCGattatcttctgac 332 target aactttcgcaagcggtgtaaggtaAAAAAtgCGGGCACcc crRNA GGAATGCAACtaccttacacc 333 gcttgcgaa tracrRNACTTCACTGATAAAGTGGAGAA 334 CCGCTTCACCAAAAGCTGTCC CTTAGGGGATTAGAACTTGAGTGAAGGTGGGCTGCTTGCATC AGCCTAATGTCGAGAAGTGCT TTCTTCGGAAAGTAACCCTCGAAACAAATTCATTT crRNA 1 GACGAATGAAGGAATGCAACt 335 accttacaccgcttgcgaacrRNA 2 GACGAATGAAGGAATGCAACc 336 cttacaccgcttgcgaaag crRNA 3GACGAATGAAGGAATGCAACt 337 tacaccgcttgcgaaagtt crRNA 4GACGAATGAAGGAATGCAACa 338 caccgcttgcgaaagttgt crRNA MMGACGAATGAAGGAATGCAACC 339 target 2 GTCGCCGTCCAGCTCGACCA crRNA MMGACGAATGAAGGAATGCAACG 340 target 3 ATCGTTACGCTAACTATGA HERC2 ATTCACTGATAAAGTGGAGAAC 341 sgRNA CGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAGT GAAGGTGGGCTGCTTGCATCA GCCTAATGTCGAGAAGTGCTTTCTTCGGAAAGTAACCCTCGA AACAAATTCATTTgaaaGAAT GAAGGAATGCAACacttgacacttaatgctcaa LbCas12a GGGTAATTTCTACTAAGTGTA 342 HERC2GATacttgacacttaatgctc crRNA aa 25 nt spacer GACGAATGAAGGAATGCAACt 343crRNA target accttacaccgcttgcgaaag 4 ttg 20 nt spacerGACGAATGAAGGAATGCAACt 335 crRNA target accttacaccgcttgcgaa 418 nt spacer GACGAATGAAGGAATGCAACt 344 crRNA target accttacaccgcttgcg 416 nt spacer GACGAATGAAGGAATGCAACt 345 crRNA target accttacaccgcttg 414 nt spacer GACGAATGAAGGAATGCAACt 346 crRNA target accttacaccgct 412 nt spacer GACGAATGAAGGAATGCAACt 347 crRNA target accttacaccg 410 nt spacer GACGAATGAAGGAATGCAACt 348 crRNA target accttacac 4Full repeat GTTGCAGAACCCGAATAGACG 349 crRNA target AATGAAGGAATGCAACtacct4 tacaccgcttgcgaa 20 nt repeat GACGAATGAAGGAATGCAACt 335 crRNA targetaccttacaccgcttgcgaa 4 17 nt repeat GAATGAAGGAATGCAACtacc 350crRNA target ttacaccgcttgcgaa 4 15 nt repeat ATGAAGGAATGCAACtacctt 351crRNA target acaccgcttgcgaa 4 10 nt repeat GGAATGCAACtaccttacacc 333crRNA target gcttgcgaa 4 tracrRNA CTTCACTGATAAAGTGGAGAA 352 +41 ntCCGCTTCACCAAAAGCTGTCC CTTAGGGGATTAGAACTTGAG TGAAGGTGGGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCT TTCTTCGGAAAGTAACCCTCG AAACAAATTCATTTTTCCTCTCCAATTCTGCACAAAAAAAGG TGAGTCCTTAT tracrRNA CTTCACTGATAAAGTGGAGAA 334+3 nt CCGCTTCACCAAAAGCTGTCC CTTAGGGGATTAGAACTTGAG TGAAGGTGGGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCT TTCTTCGGAAAGTAACCCTCG AAACAAATTCATTT tracrRNACTTCACTGATAAAGTGGAGAA 353 -26 CCGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAG TGAAGGTGGGCTGCTTGCATC AGCCTAATGTCGAGAAGTGCT TTCTTtracrRNA CTTCACTGATAAAGTGGAGAA 354 -65 CCGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAG TGAAGGTGG tracrRNA TTCACTGATAAAGTGGAGAAC 355 +0CGCTTCACCAAAAGCTGTCCC TTAGGGGATTAGAACTTGAGT GAAGGTGGGCTGCTTGCATCAGCCTAATGTCGAGAAGTGCTT TCTTCGGAAAGTAACCCTCGA AACAAATTCA tracrRNAttcacacTTCACTGATAAAGT 356 +90 nt GGAGAACCGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAA CTTGAGTGAAGGTGGGCTGCT TGCATCAGCCTAATGTCGAGAAGTGCTTTCTTCGGAAAGTAA CCCTCGAAACAAATTCAtttt tcctctccaattctgcacaaaaaaaggtgagtccttataaac cggcgtgcagaacgccggctc accttttttcttcattcgatt ttasgRNA 1 CTTCACTGATAAAGTGGAGAA 357 CCGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAG TGAAGGTGGGCTGCTTGCATC AGCCTAATGTCGAGAAGTGCTTTCTTCGGAAAGTAACCCTCG AAACAAATTCATTTgaaaGAA TGAAGGAATGCAACtaccttacaccgcttgcgaa sgRNA 2 CTTCACTGATAAAGTGGAGAA 358 CCGCTTCACCAAAAGCTGTCCCTTAGGGGATTAGAACTTGAG TGAAGGTGGGCTGCTTGCATC AGCCTAATGTCGAGAAGTGCTTTCTTCGGAAAGTAACCCTCG AAACAAATTCATTTTTCCTCT CCAATTCTGCACAAgaaaGTTGCAGAACCCGAATAGACGAAT GAAGGAATGCAACtaccttac accgcttgcgaa M13 target 1GACGAATGAAGGAATGCAACT 359 crRNA ACCGAACGAACCACCAGCAGA AGA M13 target 2GACGAATGAAGGAATGCAACT 360 crRNA CTTCTGCTGGTGGTTCGTTCG GTA M13 target 3GACGAATGAAGGAATGCAACG 361 crRNA TTTGACCATTAGATACATTTC G M13 target 4GACGAATGAAGGAATGCAACC 362 crRNA GAAATGTATCTAATGGTCAAA C

Example 1

FIG. 1 depicts examples of naturally occurring CasZ protein sequences.

FIG. 2 depicts schematic representations of CasZ loci, which include aCas1 protein in addition to the CasZ protein.

FIG. 3 depicts a phylogenetic tree of CasZ sequences in relation toother Class 2 CRISPR/Cas effector protein sequences.

FIG. 4 depicts a phylogenetic tree of Cas1 sequences from CasZ loci inrelation to Cas1 sequences from other Class 2 CRISPR/Cas loci.

FIG. 5 depicts transcriptomic RNA mapping data demonstrating expressionof trancRNA from CasZ loci. The trancRNAs are adjacent to the CasZrepeat array, but do not include the repeat sequence and are notcomplementary to the repeat sequence. Shown are RNA mapping data for thefollowing loci: CasZa3, CasZb4, CasZc5, CasZd1, and CasZe3. Smallrepeating aligned arrows represent the repeats of the CRISPR array(indicating the presence of guide RNA-encoding sequence); the peaksoutside and adjacent to the repeat arrays represent highly transcribedtrancRNAs.

This metatranscriptomic data was not 16S depleted, and hence largeportions of the data were mapped to 16S, and mRNA, for example, wasalmost not represented at all in these reads. Nonetheless, RNA mappingto the predicted trancRNA regions was observed.

Example 2

A set of CRISPR-Cas systems from uncultivated archaea that containedCas14, a family of exceptionally compact RNA-guided nucleases of just400-700 amino acids were disclosed herein, including Cas1 and Cas2proteins that are responsible for integrating DNA into CRISPR genomicloci and showed evidence of actively adapting their CRISPR arrays to newinfections. Despite their small size, Cas14 proteins were capable ofRNA-guided single-stranded DNA (ssDNA) cleavage without restrictivesequence requirements. Moreover, target recognition by Cas14 triggerednon-specific cutting of ssDNA molecules. Metagenomic data showed thatmultiple CRISPR-Cas14 systems evolved independently and suggested apotential evolutionary origin of single-effector CRISPR-based adaptiveimmunity.

Competition between microbes and viruses stimulated the evolution ofCRISPR-based adaptive immunity to provide protection against infectiousagents. In class 2 CRISPR-Cas systems, a single 100-200 kilodalton (kD)CRISPR-associated (Cas) protein with multiple functional domains carriedout RNA-guided binding and cutting of DNA or RNA substrates. Todetermine whether simpler, smaller RNA-guided proteins occurred innature, terabase-scale metagenomic datasets were queried foruncharacterized genes proximal to both a CRISPR array and cas1, the genethat encoded the universal CRISPR integrase. This analysis identified adiverse family of CRISPR-Cas systems that contain cas1, cas2, cas4, andcas14, described herein, encoding a 40-70 kD polypeptide (FIG. 8, PanelA). Twenty-four (24) different cas14 gene variants have been identifiedthat cluster into three subgroups (Cas14a-c) based on comparativesequence analysis (FIG. 8, Panels A-B, FIG. 9, FIG. 10). Cas14 proteinswere ˜400-700 amino acids (aa), about half the size of previously knownclass 2 CRISPR RNA-guided enzymes (FIG. 8, Panels C-D). While theidentified Cas14 proteins exhibited considerable sequence diversity, allwere united by the presence of a predicted RuvC nuclease domain, whoseorganization was characteristic of Type V CRISPR-Cas DNA-targetingenzymes (FIG. 8, Panel D).

The identified Cas14 proteins occurred almost exclusively within DPANN,a super-phylum of symbiotic archaea characterized by small cell andgenome sizes. Phylogenetic comparisons showed that Cas14 proteins werewidely diverse with similarities to C2c10 and C2c9, families ofbacterial RuvC-domain-containing proteins that were sometimes found neara CRISPR array but never together with other cas genes (FIG. 8, Panel Band FIG. 9). This observation and the small size of c2c10 and cas14genes made it improbable that these systems could function as standaloneCRISPR effectors.

FIG. 8, Panels A-D depict architecture and phylogeny of CRISPR-Cas14genomic loci. FIG. 8, Panel A depicts a phylogenetic tree of Type VCRISPR systems. Newly identified miniature CRISPR systems arehighlighted in orange. FIG. 8, Panel B depicts representative lociarchitectures for C2c10 and CRISPR-Cas14 systems. FIG. 8, Panel Cdepicts the length distribution of Cas14a-c systems compared to Cas12a-eand Cas9. FIG. 8, Panel D depicts the domain organization of Cas14acompared to Cas9 and Cas12a. Protein lengths are drawn to scale.

FIG. 9 depicts the maximum likelihood tree for known Type V CRISPReffectors and class 2 candidates containing a RuvC domain. Inset showsindividual orthologs for each newly identified subtype. FIG. 10 depictsa maximum likelihood tree for Cas1 from known CRISPR systems.

Example 3

Based on their proximity to conserved genes responsible for creatinggenetic memory of infection (cas1, cas2, cas4) (FIG. 11, Panel A), itwas explored whether CRISPR-Cas14 systems actively acquired DNAsequences into their CRISPR arrays. Assembled metagenomic contiguous DNAsequences (contigs) for multiple CRISPR-Cas14 loci revealed thatotherwise identical CRISPR systems showed diversity in their CRISPRarrays, suggesting active adaptation to new infections (FIG. 11, Panel Band FIG. 12, Panel A). Without intending to be bound by any particulartheory, it is proposed that the active acquisition of new DNA sequencesindicated that these CRISPR-Cas14 loci encoded functional enzymes withnucleic acid targeting activity despite their small size. To test thispossibility, it was investigated whether RNA components were producedfrom CRISPR-Cas14 loci. Environmental metatranscriptomic sequencing datawere analyzed for the presence of RNA from the native archaeal host thatcontains CRISPR-Cas14a (FIG. 12, Panel B and FIG. 13, Panel A). Inaddition to CRISPR RNAs (crRNAs), a highly abundant non-coding RNA wasmapped to about a 130-base pair sequence located between cas4a and theadjacent CRISPR array. The 20 nucleotides (nts) at the 3′ end of thistranscript were mostly complementary to the repeat segment of the crRNA(FIG. 12, Panel C and FIG. 13, Panel B), as observed fortrans-activating CRISPR RNAs (tracrRNAs) found in association with Cas9,Cas12b and Cas12e CRISPR systems. In these previously studied systems,the double-stranded-RNA-cutting enzyme Ribonuclease III (RNase III)generated mature tracrRNAs and crRNAs, but no genes encoding RNase IIIwere present in cas14-containing reconstructed genomes (FIG. 14, PanelA). This observation implied that an alternative mechanism forCRISPR-associated RNA processing existed in these hosts.

To test whether the Cas14a proteins and associated RNA components couldassemble in a heterologous organism, a plasmid was introduced into E.coli containing a minimal CRISPR-Cas14a locus that included the Cas14gene, the CRISPR array and intergenic regions containing the putativetracrRNA. Affinity purification of the Cas14a protein from cell lysateand sequencing of co-purifying RNA revealed a highly abundant maturecrRNA as well as the putative tracrRNA, suggesting that Cas14 associatedwith both crRNA and tracrRNA (FIG. 14, Panel B). The calculated mass ofthe assembled Cas14a protein-tracrRNA-crRNA particle was 48% RNA byweight compared to just 17% for S. pyogenes Cas9 (SpCas9) (FIG. 12,Panel D) and 8% for F. novicida Cas12a (FnCas12a), hinting at a centralrole of the RNA in the architecture of the Cas14a complex. Known class 2CRISPR systems required a short sequence called a protospacer adjacentmotif (PAM) to target double-stranded DNA (dsDNA). To test whetherCas14a required a PAM and could conduct dsDNA interference, E. coli wastransformed expressing a minimal Cas14a locus with a dsDNA plasmidcontaining a randomized PAM region next to a sequence matching thetarget-encoding sequence (spacer) in the Cas14 array. No depletion of aPAM sequence was detected among E. coli transformants, suggesting thatthe CRISPR-Cas14a system was either unable to target dsDNA, could do sowithout requiring a PAM, or was inactive in this heterologous host (FIG.15, Panels A-D).

FIG. 11, Panels A-B depict acquisition of new spacers by CRISPR-Cas14systems. FIG. 11, Panel A depicts alignment of Cas14 Cas1 orthologs.Expansion shows conservation of previously implicated active siteresidues highlighted in red boxes. FIG. 11, Panel B depicts multipleCRISPR arrays assembled for various CRISPR-Cas14 systems revealingspacer diversity for these CRISPR systems. Orange arrows indicaterepeats while variously colored boxes indicate unique spacers.

FIG. 12, Panels A-D depict that CRISPR-Cas14a actively adapts andencodes a tracrRNA. FIG. 12, Panel A depicts pacer diversity for Cas14aand Cas14b with CRISPR repeats diagramed in orange and unique spacersshown in different colors. FIG. 12, Panel B depicts metatranscriptomicsreads mapped to Cas14a1 and Cas14a3. Inset shows expansion of mostabundant repeat and spacer sequence. FIG. 12, Panel C depicts in silicopredicted structure of Cas14a1 crRNA and tracrRNA. RNase III orthologswere not identified in host genomes (FIG. 14, Panel A). FIG. 12, Panel Ddepicts fraction of various CRISPR complexes mass made up of by RNA andprotein.

FIG. 13, Panels A-B depict metatranscriptomics for CRISPR-Cas14 loci.FIG. 13, Panel A depicts environmental RNA sequencing reads for Cas14aorthologs. Location of Cas14 and the CRISPR array indicated below. RNAstructures to the right show the in silico predicted structure of thetracrRNA identified from metatranscriptomics. FIG. 13, Panel B depictspredicted hybridization for Cas14a1 crRNA:tracrRNA duplex.

FIG. 14, Panels A-B depict RNA processing and heterologous expression byCRISPR-Cas14. FIG. 14, Panel A depicts the presence of common RNaseorthologs in Cas14 containing genomes. Light purple represents hits thatwere significantly shorter than the expected length for the given RNase.Note that RNase III is absent in all investigated genomes. FIG. 14,Panel B depicts small RNAseq reads from heterologous expression ofCas14a1 locus in E. coli (FIG. 14, Panel B, bottom two graphs) comparedto metatranscriptomic reads (FIG. 14, Panel B, top graph). Pull downrefers to RNA that copurified with Ni-NTA affinity purified Cas14a1.

FIG. 15, Panels A-D depict plasmid depletion by Cas14a1 and SpCas9. FIG.15, Panel A depicts a diagram outlining a PAM discovery experiment. E.coli expressing the CRISPR system of interest was challenged with aplasmid containing a randomized PAM sequence flanking the target. Thesurviving (transformed) cells were harvested and sequenced along with acontrol harboring an empty vector. The depleted sequences were thensequenced and PAMs depleted more than the PAM Depletion Value Threshold(PDVT) were used to generate a Weblogo. FIG. 15, Panels B-D depict PAMsequences depleted by heterologously expressed Cas14a1 transformed witha target plasmid containing a randomized PAM sequence 5′ (FIG. 15, PanelB) or 3′ (FIG. 15, Panel C) of the target. “No sequences” indicated thatno sequences were found to be depleted at or above the given PDVT.

Example 4

It was tested whether purified Cas14a-tracrRNA-crRNA complexes werecapable of RNA-guided nucleic acid cleavage in vitro. All currentlyreconstituted DNA-targeting class 2 interference complexes were able torecognize both dsDNA and ssDNA substrates. PurifiedCas14a-tracrRNA-crRNA complexes were incubated with radiolabeled targetoligonucleotides (ssDNA, dsDNA, and ssRNA) bearing 20-nucleotidesequence complementary to the crRNA guide sequence, or anon-complementary ssDNA, and these substrates were analyzed forCas14a-mediated cleavage. Only in the presence of a complementary ssDNAsubstrate was any cleavage product detected (FIG. 16, Panel A and FIG.17, Panels A-C), and cleavage was dependent on the presence of bothtracrRNA and crRNA, which could also be combined into a single-guide RNA(sgRNA) (FIG. 16, Panel B and FIG. 18). The lack of detectable dsDNAcleavage suggested that Cas14a targeted ssDNA selectively, although itwas possible that some other factor or sequence requirement could enabledsDNA recognition in the native host. Mutation of the conserved activesite residues in the Cas14a RuvC domain eliminated cleavage activity(FIG. 17, Panel A), implicating RuvC as the domain responsible for DNAcutting. Moreover, Cas14a DNA cleavage was sensitive to truncation ofthe RNA components to lengths shorter than the naturally producedsequences (FIG. 16, Panel B and FIG. 19, Panels A-D). These resultsestablished Cas14a as the smallest class 2 CRISPR effector demonstratedto conduct programmable RNA-guided DNA cleavage.

It was tested whether Cas14a required a PAM for ssDNA cleavage in vitroby tiling Cas14a guides across a ssDNA substrate (FIG. 16, Panel C).Despite sequence variation adjacent to the targets of these differentguides, cleavage was observed for all four sequences. The cleavage sitesoccurred beyond the guide-complementary region of the ssDNA and shiftedin response to guide binding position (FIG. 16, Panel C). These datademonstrated Cas14a was an ssDNA-targeting CRISPR endonuclease that didnot require a PAM for activation.

Based on the observation that Cas14a cut outside of the crRNA/DNAtargeting heteroduplex, it was proposed that Cas14a may possesstarget-activated non-specific ssDNA cleavage activity, similar to theRuvC-containing enzyme Cas12a. To test this possibility,Cas14a-tracrRNA-crRNA was incubated with a complementary activator DNAand an aliquot of M13 bacteriophage ssDNA bearing no sequencecomplementarity to the Cas14a crRNA or activator (FIG. 16, Panel D). TheM13 ssDNA was rapidly degraded to small fragments, an activity that waseliminated by mutation of the conserved Cas14a RuvC active site,suggesting that activation of Cas14a resulted in non-specific ssDNAdegradation.

To investigate the specificity of target-dependent non-specific DNAcutting activity by Cas14a, a fluorophore-quencher (FQ) assay wasadapted in which cleavage of dye-labeled ssDNA generates a fluorescentsignal (FIG. 20, Panel A). When Cas14a was incubated with various guideRNA-target ssDNA pairs, a fluorescent signal was observed only in thepresence of the cognate target and showed strong preference for longerFQ-containing substrates (FIG. 19, Panel F and FIG. 20, Panel A). Cas14amismatch tolerance was tested by tiling 2-nt mismatches across thetargeted region in various ssDNA substrates. Mismatches near the middleof the ssDNA target strongly inhibited Cas14a activity, revealing aninternal seed sequence that was distinct from the PAM-proximal seedregion observed for dsDNA-targeting CRISPR-Cas systems (FIG. 20, Panel Band FIG. 21, Panels A-D). Moreover, DNA substrates containing strongsecondary structure resulted in reduced activation of Cas14a (FIG. 21,Panel E). Truncation of ssDNA substrates also resulted in reduced orundetectable trans cleavage (FIG. 21, Panel F). These results suggesteda mechanism of fidelity distinct from dsDNA-targeting class 2 CRISPRsystems, possibly utilizing a preordered region of the crRNA to gatecleavage activity similarly to the RNA-targeting Cas13a enzymes.

Further investigation of compact Type V systems in metagenomic datarevealed a large diversity of systems that, like Cas14a-c, include agene encoding a short RuvC-containing protein adjacent toacquisition-associated cas genes and a CRISPR array. Twenty (20)additional such systems were found that cluster into five main families(Cas14d-h). These families seemed to have evolved from independentdomestication events of TnpB, the transposase-associated proteinimplicated as the evolutionary parent of type V CRISPR effectors.Excluding cas14g, which was related to cas12b, the cas14-like genesformed separate clades on the type V effector phylogeny (FIG. 22, PanelsA-B), and their cas1 genes had different origins (FIG. 10, Panel A).Altogether 38 CRISPR-Cas14 systems belonging to eight families(Cas14a-h) were identified and eight additional systems that could notbe clustered with the analysis (termed Cas14u, Table 3).

The small size of the Cas14 proteins described herein and theirresemblance to type V effector proteins suggested that RNA-guided ssDNAcleavage may have existed as an ancestral class 2 CRISPR system. In thisscenario, a small, domesticated TnpB-like ssDNA interference complex mayhave gained additional domains over time, gradually improving dsDNArecognition and cleavage. Smaller Cas9 orthologs exhibited weakerdsDNA-targeting activity than their larger counterparts but retained theability to robustly cleave ssDNA. Aside from the evolutionaryimplications, the ability of Cas14 to specifically target ssDNAsuggested a role in defense against ssDNA viruses or mobile geneticelements (MGEs) that propagated through ssDNA intermediates. Withoutintending to be bound by any particular theory, an ssDNA-targetingCRISPR system may be particularly advantageous in certain marineenvironments where ssDNA viruses comprised the vast majority of viralabundance.

FIG. 16, Panels A-D depict CRISPR-Cas14a as an RNA-guidedDNA-endonuclease. FIG. 16, Panel A depicts cleavage kinetics of Cas14a1targeting ssDNA, dsDNA, ssRNA and off-target ssDNA. FIG. 16, Panel Bdepicts a diagram of Cas14a RNP bound to target ssDNA and Cas14a1cleavage kinetics of radiolabeled ssDNA in the presence of various RNAcomponents. FIG. 16, Panel C depicts tiling of a ssDNA substrate byCas14a1 guide sequences. FIG. 16, Panel D depicts cleavage of the ssDNAviral M13 genome with activated Cas14a1.

FIG. 17, Panels A-E depict degradation of ssDNA by Cas14a1. FIG. 17,Panel A depicts SDS-PAGE of purified Cas14a1 and Cas14a1 point mutants.FIG. 17, Panel B depicts optimization of salt, cation and temperaturefor Cas14a1 cleavage of ssDNA targets. FIG. 17, Panel C depictsradiolabled cleavage of ssDNA by Cas14a1 with spacer sequences ofvarious lengths. FIG. 17, Panel D depicts alignment of Cas14 withpreviously studied Cas12 proteins to identify RuvC active site residuesand FIG. 17, Panel E depicts cleavage of ssDNA by purified Cas14a1 RuvCpoint mutants.

FIG. 18 depicts the kinetics of Cas14a1 cleavage of ssDNA with variousguide RNA components.

FIG. 19, Panels A-F depict optimization of Cas14a1 guide RNA components.FIG. 19, Panel A depicts a diagram of Cas14a1 targeting ssDNA. Impact onCas14a1 cleavage of an FQ ssDNA substrate by varying the spacer length(FIG. 19, Panel B), repeat length (FIG. 19, Panel C), tracrRNA (FIG. 19,Panel D), and fusing the crRNA and tracrRNA together (FIG. 19, Panel E).FIG. 19, Panel F depicts a heat map showing the background subtractedfluorescence resulting from cleavage of an ssDNA FQ reporter in thepresence of various guide and target combinations.

FIG. 20, Panels A-E depict high fidelity ssDNA SNP detection byCRISPR-Cas14a. FIG. 20, Panel A depicts a fluorescent-quencher (FQ)assay for detection of ssDNA by Cas14a1 and the cleavage kinetics forvarious length FQ substrates. FIG. 20, Panel B depicts cleavage kineticsfor Cas14a1 with mismatches tiled across the substrate (individualpoints represent replicate measurements). FIG. 20, panel C depicts adiagram of Cas14-DETECTR strategy and HERC2 eye color SNP. FIG. 20,panel D depicts titration of T7 exonuclease and impact onCas14a-DETECTR. FIG. 20, panel E, depicts SNP detection usingCas14a-DETECTR with a blue-eye targeting guide for a blue-eyed andbrown-eyed saliva sample compared to ssDNA detection using Cas12a.

FIG. 21, Panels A-F depict the impact of various activators on Cas14a1cleavage rate FIG. 21, Panel A depicts a diagram of Cas14a1 targeting ofssDNA with position of mismatches used in panels A-D and raw rates forrepresentative replicates of mismatch (MM) position for Target 1.Cleavage rates for Cas14a targeting substrates with mutations tiledacross three different substrates (FIG. 21, Panels B-D). FIG. 21, PanelE depicts trans cleavage rates for substrates with increasing amounts ofsecondary structure. FIG. 21, Panel F depicts trans leavage rates withtruncated substrates. Points represent individual measurements.

FIG. 22, Panels A-B depict diversity of CRISPR-Cas14 systems. FIG. 22,Panel A depicts representative locus architecture for indicated Cas14systems. Protein lengths are drawn to scale. FIG. 22, Panel B depicts amaximum likelihood tree for Type V effectors including all eightidentified subtypes of Cas14.

FIG. 23, Panels A-C depict a test of Cas14a1 mediated interference in aheterologous host. Diagram of Cas14a1 and LbCas12a constructs to testinterference in E. coli. (B) Plaques of OX 174 spotted on E. colirevealing Cas12a- but not Cas14a1-mediated interference. Each spotrepresents a 10-fold dilution of the OX 174 stock. (C) Growth curves ofE. coli expressing Cas14a1 or LbCas12a infected with OX 174 (T,targeting; NT, non-targeting). FIG. 19, Panel F shows a heat map showingthe background-subtracted fluorescence resulting from cleavage of assDNA FQ reporter in the presence of various guide and targetcombinations after a 30-minute incubation.

While the present invention has been described with reference to thespecific embodiments thereof, it should be understood by those skilledin the art that various changes may be made and equivalents may besubstituted without departing from the true spirit and scope of theinvention. In addition, many modifications may be made to adapt aparticular situation, material, composition of matter, process, processstep or steps, to the objective, spirit and scope of the presentinvention. All such modifications are intended to be within the scope ofthe claims appended hereto.

REFERENCES

-   1. R. Barrangou et al., CRISPR provides acquired resistance against    viruses in prokaryotes. Science. 315, 1709-12 (2007).-   2. S. A. Jackson et al., CRISPR-Cas: Adapting to change. Science.    356 (6333), pp. 1-9 (2017).-   3. S. Shmakov et al., Diversity and evolution of class 2 CRISPR-Cas    systems. Nat. Rev. Microbiol. 15, 169-182 (2017).-   4. J. S. Chen, J. A. Doudna, The chemistry of Cas9 and its CRISPR    colleagues. Nat. Rev. Chem. 1, 0078 (2017).-   5. C. T. Brown et al., Unusual biology across a group comprising    more than 15% of domain Bacteria. Nature. 523, 208-211 (2015).-   6. K. Anantharaman et al., Thousands of microbial genomes shed light    on interconnected biogeochemical processes in an aquifer system.    Nat. Commun. 7, 13219 (2016).-   7. V. M. Markowitz et al., IMG/M 4 version of the integrated    metagenome comparative analysis system. Nucleic Acids Res. 42,    568-573 (2014).-   8. V. M. Markowitz et al., IMG: The integrated microbial genomes    database and comparative analysis system. Nucleic Acids Res. 40,    115-122 (2012).-   9. A. J. Probst et al., Genomic resolution of a cold subsurface    aquifer community provides metabolic insights for novel microbes    adapted to high CO2concentrations. Environ. Microbiol. 19, 459-474    (2017).-   10. I. Yosef, M. G. Goren, U. Qimron, Proteins and DNA elements    essential for the CRISPR adaptation process in Escherichia coli.    Nucleic Acids Res. 40, 5569-5576 (2012).-   11. J. K. Nuñez, A. S. Y. Lee, A. Engelman, J. a. Doudna,    Integrase-mediated spacer acquisition during CRISPR-Cas adaptive    immunity. Nature. 519, 193-198 (2015).-   12. S. Shmakov et al., Discovery and Functional Characterization of    Diverse Class 2 CRISPR-Cas Systems. Mol. Cell. 60, 385-397 (2015).-   13. D. Burstein et al., New CRISPR-Cas systems from uncultivated    microbes. Nature. 542, 237-241 (2017).-   14. C. Rinke et al., Insights into the phylogeny and coding    potential of microbial dark matter. Nature. 499, 431-437 (2013).-   15. C. J. Castelle et al., Genomic expansion of domain archaea    highlights roles for organisms from new phyla in anaerobic carbon    cycling. Curr. Biol. 25, 690-701 (2015).-   16. E. Deltcheva et al., CRISPR RNA maturation by trans-encoded    small RNA and host factor RNase III. Nature. 471, 602-607 (2011).-   17. K. E. Savell, J. J. Day, Applications of CRISPR/CAS9 in the    mammalian central nervous system. Yale J. Biol. Med. 90 (2017), pp.    567-581.-   18. F. J. M. Mojica, C. Díez-Villaseñor, J. García-Martínez, C.    Almendros, Short motif sequences determine the targets of the    prokaryotic CRISPR defence system. Microbiology. 155, 733-740    (2009).-   19. Y. Zhang, R. Rajan, H. S. Seifert, A. Mondragón, E. J.    Sontheimer, DNase H Activity of Neisseria meningitidis Cas9. Mol.    Cell. 60, 242-255 (2015).-   20. E. Ma, L. B. Harrington, M. R. O'Connell, K. Zhou, J. A. Doudna,    Single-Stranded DNA Cleavage by Divergent CRISPR-Cas9 Enzymes. Mol.    Cell. 60, 398-407 (2015).-   21. J. S. Chen et al., CRISPR-Cas12a target binding unleashes    indiscriminate single-stranded DNase activity. Science. 360, 436-439    (2018).-   22. B. Zetsche et al., Cpf1 Is a Single RNA-Guided Endonuclease of a    Class 2 CRISPR-Cas System. Cell. 163, 759-771 (2015).-   23. A. East-Seletsky et al., Two distinct RNase activities of    CRISPR-C2c2 enable guide—RNA processing and RNA detection. Nature.    538, 270-273 (2016).-   24. L. Liu et al., The Molecular Architecture for RNA-Guided RNA    Cleavage by Cas13a. Cell. 170, 714-726.e10 (2017).-   25. O. O. Abudayyeh et al., C2c2 is a single-component programmable    RNA-guided RNA-targeting CRISPR effector. Science (80-.). 353, 1-9    (2016).-   26. G. J. Knott et al., Guide-bound structures of an RNA-targeting    A-cleaving CRISPR-Cas13a enzyme. Nat. Struct. Mol. Biol. 24, 825-833    (2017).-   27. S. Y. Li et al., CRISPR-Cas12a has both cis- and trans-cleavage    activities on single-stranded DNA. Cell Res. 28, 491-493 (2018).-   28. H. Eiberg et al., Blue eye color in humans may be caused by a    perfectly associated founder mutation in a regulatory element    located within the HERC2 gene inhibiting OCA2 expression. Hum.    Genet. 123, 177-187 (2008).-   29. S. Shmakov et al., Diversity and evolution of class 2 CRISPR-Cas    systems. Nat. Rev. Microbiol. 15, 169-182 (2017).-   30. E. V Koonin, K. S. Makarova, F. Zhang, Diversity, classification    and evolution of CRISPR-Cas systems. Curr. Opin. Microbiol. 37,    67-78 (2017).-   31. K. S. Makarova et al., An updated evolutionary classification of    CRISPR-Cas systems. Nat Rev Microbiol, 1-15 (2015).-   32. O. Barabas et al., Mechanism of IS 200/IS 605 Family DNA    Transposases: Activation and Transposon—Directed Target Site    Selection, 208-220 (2008).-   33. M. Yoshida et al., Quantitative viral community DNA analysis    reveals the dominance of single-stranded DNA viruses in offshore    upper bathyal sediment from Tohoku, Japan. Front. Microbiol. 9, 1-10    (2018).-   34. C. T. Brown et al., Unusual biology across a group comprising    more than 15% of domain Bacteria. Nature. 523, 208-211 (2015).-   35. K. Anantharaman et al., Thousands of microbial genomes shed    light on interconnected biogeochemical processes in an aquifer    system. Nat. Commun. 7, 13219 (2016).-   36. A. J. Probst et al., Genomic resolution of a cold subsurface    aquifer community provides metabolic insights for novel microbes    adapted to high CO2concentrations. Environ. Microbiol. 19, 459-474    (2017).-   37. V. M. Markowitz et al., IMG/M 4 version of the integrated    metagenome comparative analysis system. Nucleic Acids Res. 42,    568-573 (2014).-   38. V. M. Markowitz et al., IMG: The integrated microbial genomes    database and comparative analysis system. Nucleic Acids Res. 40,    115-122 (2012).-   39. R. D. Finn, J. Clements, S. R. Eddy, HMMER web server:    interactive sequence similarity searching. Nucleic Acids Res. 39,    W29-W37 (2011).-   40. D. Burstein et al., New CRISPR-Cas systems from uncultivated    microbes. Nature. 542, 237-241 (2017).-   41. I. Grissa, G. Vergnaud, C. Pourcel, CRISPRFinder: a web tool to    identify clustered regularly interspaced short palindromic repeats.    Nucleic Acids Res. 35, W52-W57 (2007).-   42. A. Biswas, R. H. J. Staals, S. E. Morales, P. C. Fineran, C. M.    Brown, CRISPRDetect: A flexible algorithm to define CRISPR arrays.    BMC Genomics. 17, 1-14 (2016).-   43. A. Stamatakis, RAxML version 8: A tool for phylogenetic analysis    and post-analysis of large phylogenies. Bioinformatics. 30,    1312-1313 (2014).-   44. B. Langmead, S. L. Salzberg, Fast gapped-read alignment with    Bowtie 2. Nat Methods. 9, 357-359 (2012).-   45. H. Ogata et al., KEGG: Kyoto encyclopedia of genes and genomes.    Nucleic Acids Res. 27, 29-34 (1999).-   46. L. B. Harrington et al., A thermostable Cas9 with increased    lifetime in human plasma. Nat. Commun. 8, 1-7 (2017).-   47. G. Crooks, G. Hon, J. Chandonia, S. Brenner, NCBI GenBank FTP    Site\nWebLogo: a sequence logo generator. Genome Res. 14, 1188-1190    (2004).-   48. L. B. Harrington et al., A Broad-Spectrum Inhibitor of    CRISPR-Cas9. Cell. 170, 1224-1233.e15 (2017).-   49. J. S. Chen et al., CRISPR-Cas12a target binding unleashes    indiscriminate single-stranded DNase activity. Science. 360, 436-439    (2018).-   50. Brown et al., Unusual biology across a group comprising more    than 15% of domain Bacteria. Nature. 523, 208-211 (2015),    doi:10.1038/nature14486.

1-86. (canceled)
 87. A composition comprising a programmable nucleasehaving a length of no more than 900 amino acids, or a nucleic acidencoding the programmable nuclease, and a non-naturally occurring orengineered guide nucleic acid comprising a region that binds to theprogrammable nuclease and a guide sequence that is complementary to atarget sequence of a target nucleic acid, or a nucleic acid encoding theguide nucleic acid.
 88. The composition of claim 87, further comprisinga transactivating noncoding RNA.
 89. The composition of claim 87,further comprising the target nucleic acid.
 90. The composition of claim87, wherein the target nucleic acid is single stranded DNA.
 91. Thecomposition of claim 89, wherein the target nucleic acid is singlestranded DNA.
 92. The composition of claim 90, wherein the targetnucleic acid lacks a protospacer adjacent motif (PAM) sequence.
 93. Thecomposition of claim 91, wherein the target nucleic acid lacks a PAMsequence.
 94. The composition of claim 87, wherein the target nucleicacid is double stranded DNA.
 95. The composition of claim 89, whereinthe target nucleic acid is double stranded DNA.
 96. The composition ofclaim 87, wherein the target nucleic acid is a eukaryotic target DNA.97. The composition of claim 89, wherein the target nucleic acid is aprokaryotic target DNA.
 98. The composition of claim 87, wherein thetarget nucleic acid is a prokaryotic target DNA.
 99. The composition ofclaim 87, wherein the target nucleic acid is a viral target DNA
 100. Thecomposition of claim 89, wherein the target nucleic acid is a viraltarget DNA.
 101. The composition of claim 87, further comprising a donorpolynucleotide.
 102. The composition of claim 87, further comprising acell.
 103. The composition of claim 102, wherein the cell comprises theprogrammable nuclease and the non-naturally occurring or engineeredguide nucleic acid.
 104. The composition of claim 102, wherein the cellis a eukaryotic cell.
 105. The composition of claim 87, wherein theprogrammable nuclease comprises three partial RuvC domains.
 106. Thecomposition of claim 87, wherein the programmable nuclease comprisesRuvC-I, RuvC-II, and RuvC-III subdomains.
 107. The composition of claim87, wherein the programmable nuclease is a CasZ protein.
 108. Thecomposition of claim 87, wherein the programmable nuclease is selectedfrom a group consisting of a CasZa protein, a CasZb protein, a CasZcprotein, a CasZd protein, a CasZe protein, a CasZf protein, a CasZgprotein, a CasZh protein, a CasZi protein, a CasZj protein, a CasZkprotein, and a CasZl protein.
 109. The composition of claim 87, whereinthe programmable nuclease is a variant programmable nuclease withreduced nuclease activity compared to a corresponding wild typeprogrammable nuclease.
 110. The composition of claim 87, wherein theprogrammable nuclease is a dead programmable nuclease.
 111. Thecomposition of claim 87, wherein the programmable nuclease is conjugatedto a heterologous moiety.
 112. The composition of claim 111, wherein theheterologous moiety is a heterologous polypeptide.
 113. The compositionof claim 112, wherein the heterologous polypeptide comprises a nuclearlocalization signal.
 114. The composition of claim 87, wherein theprogrammable nuclease has a length of from 350 to 900 amino acids. 115.The composition of claim 87, wherein the programmable nuclease has alength of no more than 800 amino acids.
 116. The composition of claim87, wherein the guide nucleic acid is a guide RNA.